The Best Publicly Available Educational Data Set Prize

The Best Publicly Available Educational Data Set Prize is given annually to the data set that has the potential to lead to or has already led to the most significant progress in the scientific field and the community of practice. Nominees will be considered individually, with consideration for both past and future impacts. The prize-winning data set is selected by a committee of leaders in the field, selected by the Board of Directors of the International Educational Data Mining Society. Current committee members are ineligible to receive the award, but former committee members are eligible to receive the award.

Award winners receive a prize of $2,000 and free registration to attend and potentially present an award talk at the International Conference on Educational Data Mining.

Nominations are open for the 2023 prize and should be sent directly by email to Announcements were sent on edm-announce and the Learning Engineering Google Group. Instructions to nominate are given below.

Award Winners

2022. Exploring Common Trends in Online Educational Experiments Data. Data Set provided by ASSISTments.
Data on thousands of students participating in one of 88 different assignment-level randomized controlled experiments performed within ASSISTments. Students’ clickstream data has been provided in its raw form as well as aggregated into a problem level and assignment level summary of their performance for each student that participated in an experiment. Additionally, information is provided on each student’s prior performance. So far this data has been used to perform a meta-analysis of findings across similar experiments.
The award talk will be given at the 16th International Conference on Educational Data Mining, in Bengalaru, India.

2021. The NeurIPS 2020 Education Challenge. Data Set provided by Eedi.
Data on millions of students’ answers to mathematics questions.
Used in scientific competition by dozens of researchers to predict student responses, determine question quality, and identify a personalized set of questions for each student.
The award is expected to be given at the 16th International Conference on Educational Data Mining, in Bengaluru, India.

Instructions for nomination

We welcome proposals for this prize and invite all members of the community to nominate data sets they consider valuable contributions to our community. Self-nominations are allowed, as are nominations of data sets posted by other individuals.

Please fill out the details of the data set below. We include a checklist of the information we are looking for. Should this information not be available in the description of the data set, please provide as much of the requested information as possible from the checklist. We will follow up with data set owners where we have further questions.

Nominations and questions should be emailed directly to

DEADLINE for proposals: June 1, 2023

Data set name:
Data set URL(s) (description and download):

Checklist of information requested.

1) SIZE, COVERAGE, BREADTH. How many data records are included in this data set? What are the demographics and representativeness of the student records? Is there missing data? Any other relevant information that describes the characteristics and nature of the data.

2) SOURCE. What is the original source of this data set?

3) PUBLICATIONS. Published papers describing and/or using this data set

4) LEGAL AND ETHICS CONDITIONS. Is the data subject to ethical issues and was its collection reviewed and approved by an organization (institutional review board in countries where applicable)? Who owns the data. What are the specific restrictions if any (eg. legal agreement)?

5) AVAILABILITY. Is this data set available to any member of the general public?

6) PERMANENCE. Is this data set intended to be publicly available in perpetuity. Is this data set currently archived in an external database with sustained funding?

7) PRIVACY. Describe measures that have been taken to safeguard student privacy.

8) IMPACT. If applicable and in a paragraph or so, describe how this dataset has already impacted the scientific field and/or the research community.

9) FUTURE WORK. In a paragraph or so, describe how you believe this dataset might lead to future work in the field or the research community, such as what sorts of new work it might enable.

10) (Optional) Is there anything else about this dataset and its uses that you think make it a particularly valuable educational dataset? If so, please tell us.