Learning student program embeddings using abstract execution traces
Guillaume Cleuziou, Frédéric Flouvat
Jun 30, 2021 19:50 UTC+2
—
Session C1
—
Zoom link
Keywords: Representation Learning, Program Embeddings, Feedback propagation, Neural Networks, Educational Data Mining, Computer Science Education, doc2vec
Abstract:
Improving the pedagogical effectiveness of programming training platforms is a hot topic that requires the construction of fine and exploitable representations of learners' programs. This article presents a new approach for learning program embeddings. Starting from the hypothesis that the functionality of a program, but also its "style", can be captured by analyzing its execution traces, the code2aes2vec method proceeds in two steps. A first step generates abstract execution sequences (AES) from predefined test cases and abstract syntax trees (AST) of the submitted programs. The doc2vec method is then used to learn condensed vector representations (embeddings) of the programs from these AESs. Experiments performed on real data sets shows that the embeddings generated by code2aes2vec efficiently capture both the semantics and the style of the programs. Finally, we show the relevance of the program embeddings thus generated on the task of automatic feedback propagation as a proof of concept.