30 January 2008

WebEval

As you might now I am now teaching a course on operating systems. This is an optional course, so the School does not impose any constraint on the content. For fairly administrative reasons (meaning "I don't want to talk about it, at least not now") the title remained Operating Systems as it was last year. I decided to structure the course around four important concepts: memory management, file systems, multitasking, and networking (no proper link for the last one: try this and this). Each of these will have four lectures allocated. The course started with two introductory lectures that presented the Linux command line. The assessment system also changed a bit from my last post (again, don't ask). The final exam is worth 20 points; a written report is worth 10 points; finally, 70 points are earned throughout the term by solving assignments that are judged automatically. The harder assignments are labeled projects and students are explicitly asked to collaborate on finding good ideas for the solution (but not on writing the program).

In this post I'll tell you a little about this automatic judge.

I want to be able to give two types of assignments: quizzes and programs. For quizzes it is obvious that judging can be automated: one just has to compare the student's choices with the correct answers. For programs it's not clear how to do the judging. People involved in programming competitions probably already yell inside: "just specify the input/output format (or some similar interface) and use a bunch of tests, comparing the actual output with some reference output." And that's what I did. But I want to point out that this is by no means the only way to judge a program. It just happens to be a good one. Other ways include using theorem provers to see if a program matches a specification, running style checkers, counting various metrics, making the programs compete with each other, and so on. The specific way one chooses to do the automatic judging influences the type of tasks you can pose. One big advantage of the I/O testing idea is that it allows many tasks to be posed in a form amenable to such judging. The other big competitor is the direct competition between programs. I'll let you bash using metrics such as cyclomatic complexity and style checkers for giving scores. Once this system is chosen I had to decide whether I want the IOI scoring or the TopCoder one. I decided to go for the former, which is more lenient: If you pass half of the tests you get half of the points. TopCoder gives you exactly zero points even if only one test fails.

I want the automatic judge to keep track of the scores and I want it to be easily maintainable. That is, I want to do as little work as possible in order to add a quiz or a problem, I don't want to do any work to take a problem down after the deadline, and so on. Basically, I'm lazy. But I care for the students too. I don't want to ask them to install anything or follow some conventions they don't like unless absolutely necessary. So it's decided: it has to be a website running in a browser.

In short: require no software on the client side apart from a browser, judge quizzes automatically, judge programs automatically by running them on the server, keep the scores, make it easy to add/remove quizzes/problems.

Here's how it looks (click to enlarge, and see the available languages ):

It's made out of

  • A client. It renders stuff in the browser area. It does as much as possible without bothering the server. For example, it updates the deadline counter every minute even if it asked the server for a time interval only once, immediately after login.
  • A server. This one distributes work to two modules (classes, actually).
    • Judge. Responsible for compiling, running program, and comparing their output with the reference result.
    • Database. Provides a way to fetch data about problem statements, quizzes questions, accounts, and so on. It also allows storing and reading scores. This is an interface. The work is done by the class FileDatabase, whose name suggests what database technology I use :).

My approach was: get it done quickly. So the following signatures will probably look messy to you. They are here for you to criticize. Here's the interface between client and server (only the client calls into the server).

package org.rgrig.client;

import com.google.gwt.user.client.rpc.*;

/**
 * The interface between the client and the server.
 *
 * The client can ask for quizzes/problems, can ask for the available
 * languages, can log in and out, can ask for a quiz/problem to 
 * be judged, and can ask for the current scores.
 */
public interface WebEvalSrv extends RemoteService {
  /* Ask for (active) problems and quizzes. */
  public Problem[] getProblems() throws ServerException;
  public Quiz[] getQuizzes() throws  ServerException;

  /* Returns the list of available languages. */
  public String[] getLanguages() throws ServerException;

  /* Functions for logging in and out. */
  public String login(String pseudonym, String passwd) throws ServerException;
  public void logout() throws ServerException;

  /* Judging functions. */
  public PbEval judgeProblem(
    String problem,
    String language, 
    String solution) throws ServerException;
  public double judgeQuiz(
    String quiz,
    String solution)
    throws ServerException;

  /* Returns the user scores now. */
  public User[] getScores() throws ServerException;
}

And here's what the database can do.

package org.rgrig.server;

import java.util.*;
import org.rgrig.client.*;

/**
 * Knows how to parse the database info.
 */
public interface Database {
  public Quiz[] getQuizzes() throws ServerException;
  public Problem[] getProblems() throws ServerException;
  public String getQuizAnswer(String quizId) throws ServerException;
  public PbTest[] getProblemExamples(String pbId) throws ServerException;
  public PbTest[] getProblemTests(String pbId) throws ServerException;
  public PbLimit getProblemLimits(String pbId) throws ServerException;

  public String[] getLanguages() throws ServerException;
  public Language getLanguage(String id) throws ServerException;

  public double getScore(String task) 
    throws ServerException;
  public double getScore(String pseudo, String task)
    throws ServerException;
  public User[] getScores()
    throws ServerException;
  public void setScore(String pseudo, String task, double score)
    throws ServerException;

  /** Returns the student ID, or null if the check fails. */
  public String checkLogin(String pseudonym, String passwdHash) 
    throws ServerException;
}

And now the part I'm most excited about. If I want to create a quiz that looks like this

then I have to create a directory with two text files. One contains the questions.

How does one send the output of <tt>cmdA</tt> to the input of
<tt>cmdB<tt>?
 - <tt>cmdA | xargs cmdB</tt>
 - <tt>cmdA | cmdB</tt>
 - <tt>cmdA > cmdB</tt>
 - <tt>cmdA > xargs cmdB</tt>

(Except that the good answer has * instead of -. This quiz is still live so I won't post the good answer right here right now.) The second file looks like this:

start 2008-01-28 11:00
deadline 2008-02-04 11:00
score 1
name Command line basics

And that's it! The problem will appear on the website in between the start date and the deadline, it will be worth 1 point, and the name that appears on the website will be Command line basics.

Creating a problem is just a little more difficult. One has to create two directories with tests (examples and tests). These contain pairs of files test_name.in and test_name.out. The in file is fed to the program and the output is compared with the out file. Examples are different in that they are rendered in the problem statement, the student can see his/her program's output on the examples, and the answer must be correct for examples in order to get any points. Once the program gets the examples right it is run on the real tests and the score is proportional with the number of tests for which the answer is correct.

To develop this stuff I used GWT. The ugly part was that the "hosted mode", which is supposed to help with debugging, didn't really work for me. So I had to restart the web server on my computer often. (I suppose I could have insisted to see why RootPanel.get() return null (only) in hosted mode, but, to my shame, I was not curious enough, and just wanted to get it done quickly). On the good side the RPC is quite easy (and no, I don't too much mind having to write closures to process the returned values). Another good part is that I'm quite happy with how the site looks and so far I had no problems with browser incompatibilities.

While I was developing I was thinking often about what should be on the server and what should be on the client, including for non-security sensitive stuff. I vaguely remember that Links lets you ignore this issue if you choose so. In other words, in Links RPC's are really simple. Now, that's just a vague memory. I hope I'll have time at some point to experiment with it.

In the meantime, would you use such a website for your course?

No comments:

Post a Comment

Note: (1) You need to have third-party cookies enabled in order to comment on Blogger. (2) Better to copy your comment before hitting publish/preview. Blogger sometimes eats comments on the first try, but the second works. Crazy Blogger.