SimGrade: Using Code Similarity Measures for More Accurate Human Grading
Abstract: In computer science courses, grading such exam problems can be a difficult and inconsistent process, especially when graded by a large course staff. In this paper, we show how AI techniques for recognizing similar assignments have the potential to improve the human grading process.Ideally, different graders assessing the same student submission would assign the same score, but analysis of historical grading patterns shows that this happens less often than desired — an issue that is not commonly acknowledged in the context of programming problems. These inconsistencies can raise questions of fairness and can negatively impact students’ experiences in the course, necessitating the development of methods to ensure more consistent grading.Through analysis of historical exam data, we demonstrate that graders are able to more accurately assign a score to a student submission when they have previously seen another submission similar to it. As a result, we hypothesize that we can improve exam grading accuracy by ensuring that each submission that a grader sees is similar to at least one submission they have previously seen.We propose several algorithms for (1) assigning student submissions to graders, and (2) ordering submissions to maximize the probability that a grader has previously seen a similar solution, leveraging distributed representations of student code in order to measure similarity between submissions. We demonstrate that these algorithms achieve higher grading accuracy than is achieved by randomly assigning and ordering submissions.