The Best Publicly Available Educational Data Set Prize, funded by Schmidt Futures

The Best Publicly Available Educational Data Set Prize is given annually to the data set which has most led to progress in the scientific field and the community of practice. The prize-winning data set is selected by a committee of leaders in the field, selected by the Board of Directors of the International Educational Data Mining Society. Current committee members are ineligible to receive the award, but former committee members are eligible to receive the award.

Award winners receive a prize of $2,000 and free registration to attend and potentially present an award talk at the International Conference on Educational Data Mining.

Nominations are open for the 2022 prize and should be sent directly by email to Announcements were sent on edm-announce and the Learning Engineering Google Group. Insructions to nominate are given below.

The award is made possible through the generous support of Schmidt Futures, a philanthropic initiative founded by Eric and Wendy Schmidt.

Award Winners

2021. The NeurIPS 2020 Education Challenge. Data Set provided by Eedi.
Data on millions of students’ answers to mathematics questions.
Used in scientific competition by dozens of researchers to predict student responses, determine question quality, and identify a personalized set of questions for each student.
The award talk will be given at the 15th International Conference on Educational Data Mining, in Durham, UK.

Instructions for nomination

We welcome proposals for this prize and invite all members of the community to nominate data sets they consider valuable contributions to our community. Self-nominations are allowed, as are nominations of data sets posted by other individuals.

Please fill out the details of the data set below. We include a checklist of the information we are looking for. Should this information not be available in the description of the data set, please provide as much of the requested information as possible from the checklist. We will follow up with data set owners where we have further questions.

Nominations and questions should be emailed directly to

DEADLINE for proposals: June 20, 2022 (update: June 25, 2022)

Data set name:
Data set URL(s) (description and download):

Checklist of information requested.

1) SIZE, COVERAGE, BREADTH. How many data records are included in this data set? What are the demographics and representativeness of the student records? Is there missing data? Any other relevant information that describes the characteristics and nature of the data.

2) SOURCE. What is the original source of this data set?

3) PUBLICATIONS. Published papers describing and/or using this data set

4) LEGAL AND ETHICS CONDITIONS. Is the data subject to ethical issues and was its collection reviewed and approved by an organization (institutional review board in countries where applicable)? Who owns the data. What are the specific restrictions if any (eg. legal agreement)?

5) AVAILABILITY. Is this data set available to any member of the general public?

6) PERMANENCE. Is this data set intended to be publicly available in perpetuity. Is this data set currently archived in an external database with sustained funding?

7) PRIVACY. Describe measures that have been taken to safeguard student privacy.

8) (Optional) In a paragraph or so, tell us why you think this dataset is a particularly valuable educational dataset.