Automatic Assessment of the Design Quality of Python Programs with Personalized Feedback
Abstract: The assessment of program functionality can generally be accomplished with straight-forward unit tests. However, assessing the design quality of a program is a much more difficult and nuanced problem. Design quality is an important consideration since because it affects the readability and maintainability of programs. Assessing design quality and giving personalized feedback is very time consuming for instructors and teaching assistants. This limits the scale of giving personalized feedback to small class settings. Further, design quality is nuanced and is difficult to concisely express as a set of rules. For these reasons, we propose a neural network model to both automatically assess the design of a program and provide personalized feedback to guide students on how to make corrections. The model’s effectiveness is evaluated on a corpus of student programs written in Python. The model has an accuracy rate from 83.67% to 94.27%, depending on the dataset, when predicting design scores as compared to historical instructor assessment. Finally, we present a study where students tried to improve the design of their programs based on the personalized feedback produced by the model. Students who participated in the study improved their program design scores by 19.58%.