Yanxin Lu, Learning to Grade Student Programs in a Massive Open Online Course

Slides

We study the problem of automatically evaluating the quality of computer programs produced by students in a very large, online, interactive programming course (or “MOOC”). Automatically evaluating interactive programs (such as computer games) is not easy because such programs lack any sort of well-defined logical specification, so it is difficult to devise a testing platform that can “play” a student-coded game to determine whether it is correct.

As an alternative, we devise a simple statistical approach to assigning a score to a student-produced code. We define a set of distance metrics between pairs of program fragments that measure the similarity of the fragments. Lightweight program analysis is used to compute these distances in a way that is semantically meaningful, but also scales to large sets (hundreds of thousands) of programs. These metrics can be used to perform a distance-based classification or regression analysis, using a set of already-graded training data to learn to assign scores to ungraded assignments.

We evaluate our approach on a large dataset of student-submitted programs from a course on introductory Python programming offered at a well-known MOOC platform.