The Best Publicly Available Educational Data Set Prize

The Best Publicly Available Educational Data Set Prize is given annually to the data set that has the potential to lead to or has already led to the most significant progress in the scientific field and the community of practice. Nominees will be considered individually, with consideration for both past and future impacts. The prize-winning data set is selected by a committee of leaders in the field, selected by the Board of Directors of the International Educational Data Mining Society. Current committee members are ineligible to receive the award, but former committee members are eligible to receive the award.

Award winners receive a prize of $2,000 and free registration to attend and potentially present an award talk at the International Conference on Educational Data Mining.

Nominations are open for the 2023 prize and should be sent directly by email to Announcements were sent on edm-announce and the Learning Engineering Google Group. Instructions to nominate are given below.

Award Winners

2023. OULAD – Open University Learning Analytics Dataset. Data Set provided by Knowledge Media Institute, The Open University.
The OULAD data set contains information from courses presented at the Open University (OU). What makes the dataset unique is the fact that it contains demographic data together with aggregated clickstream data of students’ interactions in the Virtual Learning Environment (VLE). This enables the analysis of student behaviour, represented by their actions. The dataset contains the information about 22 courses, 32,593 students, their assessment results, and logs of their interactions with the VLE represented by daily summaries of student clicks (10,655,280 entries). The dataset is freely available at under a CC-BY 4.0 license. It is described in a paper published in Nature Scientific Data Journal entitled Open University Learning Analytics Dataset (Kuzilek J., Hlosta M., Zdrahal Z., 2017).

2022. Exploring Common Trends in Online Educational Experiments Data. Data Set provided by ASSISTments.
Data on thousands of students participating in one of 88 different assignment-level randomized controlled experiments performed within ASSISTments. Students’ clickstream data has been provided in its raw form as well as aggregated into a problem level and assignment level summary of their performance for each student that participated in an experiment. Additionally, information is provided on each student’s prior performance. So far this data has been used to perform a meta-analysis of findings across similar experiments.

2021. The NeurIPS 2020 Education Challenge. Data Set provided by Eedi.
Data on millions of students’ answers to mathematics questions.
Used in scientific competition by dozens of researchers to predict student responses, determine question quality, and identify a personalized set of questions for each student.