Introduction to the Proceedings

Preface

The Georgia Institute of Technology is proud to host the seventeenth International Conference on Educational Data Mining (EDM) in Atlanta, Georgia, July 14-July 17, 2024. EDM is the annual flagship conference of the International Educational Data Mining Society. This year’s theme is “New tools, new prospects, new risks – educational data mining in the age of generative AI.” The theme focuses on the movement from descriptive and predictive models to generative artificial intelligence (AI) and what that means for learning environments and processes. While the new methods unlock exciting new potentials for educational data mining, they also foreground many ethical considerations and risks that are associated with all types of machine learning and artificial intelligence. This year, we additionally welcomed research in the following areas: mitigating biases and harms that may result from model use, accounting for the stereotypes that are inherent to the large models that drive generative AI, separating the hype surrounding these new technologies from their potential in educational settings, and finding ways to use these models to better understand learning processes and support learning.

The scientific programming for EDM2024 includes

Two keynote talks by outstanding researchers in the field, Tanja Käser (EPFL, Switzerland) and Carolyn Rosé (CMU, USA)
A plenary Prof. Ram Kumar Educational Data Mining Test of Time Award talk by Gautam Biswas (Vanderbilt University, USA)
A plenary Data Set award talk by Jakub Kužílek (Humboldt-Universität zu Berlin & German Research Center for Artificial Intelligence, Germany) and Martin Hlosta (Institute for Distance Learning and eLearning Research & Swiss Distance University of Applied Sciences, Switzerland)
Five tutorials (foundational as well as advanced)
Five workshops
Twelve paper presentation sessions on the topics of large language models in education (two sessions), affective computing, learning analytics and recommender systems, reinforcement learning and pedagogical agents, knowledge tracing and curricula, collaborative learning, prediction and supervised learning, research practices (two sessions), and computer science education (two sessions)
Two poster presentation sessions
An industry track and industry panel
A doctoral student consortium

The tutorials are: (1) Logistic Knowledge Tracing Tutorial, (2) Promoting Open Science in Educational Data Mining, (3) Thinking Causally in EDM, (4) Beyond Tutor Logs: Utilizing Sensor Data for Measuring Student Behavior, and (5) Tools for Planning and Analyzing Randomized Controlled Trials and A/B Tests. The workshops are: (1) 8th Educational Data Mining in Computer Science Education (CSEDM) Workshop, (2) Human-Centric eXplainable AI in Education (HEXED) Workshop, (3) Causal Inference in Educational Data Mining, (4) Leveraging Large Language Models for Next-Generation Educational Technologies, and (5) Educational Data Mining in Writing and Literacy Instruction.

The venue is the Global Learning Center of Georgia Tech, home of Georgia Tech’s Division of Lifetime Learning. The Global Learning Center is nestled in Tech Square, a burgeoning tech hub in Midtown Atlanta attached to the Georgia Tech Hotel & Conference Center and across the street from the Tech Square Research Building and the Coda Building, home of many of the faculty of the institute’s College of Computing. EDM 2024 received 81 submissions to the full papers track (10 pages), 68 to the short papers track (6 pages), and 29 to the poster and demo track (4 pages). The program committee accepted 21 full papers (26 The EDM 2024 industry track and the industry panel fostered exchange between industry application research and basic research. Four papers and two posters were included in the industry track. Panelists included Kristen DiCerbo (Khan Academy), Diego Zapata-Rivera (ETS Research Institute), Lewis Johnson (Alelo), and Bibi Groot (EeDI). EDM 2024 also continued its tradition of providing opportunities for young researchers to present their work and receive feedback from their peers and senior researchers. The doctoral consortium this year features 12 such participants. EDM 2024 is especially proud to offer travel sponsorships to 15 students who will attend EDM thanks to this support. We thank the sponsors of EDM 2024 for their generous support. We thank all the authors who submitted their work and the program committee members for their expert inputs. We thank the members of the organization committee for their leadership that made this conference possible. And, a big Thank You to the local organizing committee who made this event memorable.

Carrie Demmans Epp	University of Alberta, Canada	Program Chair
Benjamin Paaßen	Bielefeld University, Germany	Program Chair
David Joyner	Georgia Institute of Technology, USA	General Chair

July 14th, 2024
Atlanta, GA, USA

Organizing Committee

General Chairs

David Joyner – Georgia Institute of Technology, USA

Program Chairs

Benjamin Paaßen (they/them) – Bielefeld University, Germany
Carrie Demmans Epp (she/her) – University of Alberta, Canada

Equity, Diversity, and Inclusion Chairs

Anna Rafferty (she/her) – Carleton College, USA
Jie Tang (he/him) – Tsinghua University, China

Accessibility Chairs

Nigel Bosch (he/him) – University of Illinois Urbana-Champaign, USA

Industry Track Chairs

Carol M. Forsyth (she/her) – Educational Testing Service, USA
Avi Segal (he/him) – Ben-Gurion University of the Negev, Israel

Poster & Demo Track Chairs

Heeryung Choi (she/her) – University of Michigan, USA
Irena Koprinska (she/her) – The University of Sydney, Australia

Doctoral Consortium Chairs

Neil Heffernan – Worcester Polytechnic Institute, USA
Luc Paquette (he/him) – University of Illinois Urbana-Champaign, USA

JEDM Track Chairs

Maria Mercedes T. Rodrigo (she/her) – Ateneo de Manila University, Philippines
Agathe Merceron (she/her) – Berlin Hochschule für Technik, Germany
Jaclyn Ocumpaugh – University of Pennsylvania, USA

Workshop & Tutorials Chairs

Bita Akram (she/her) – North Carolina State University, USA
Sergey Sosnovsky – Utrecht University, Netherlands

Awards Chairs

Danielle McNamara – Arizona State University, USA
Cristóbal Romero – University of Córdoba, Spain
Marianne Winslett – University of Illinois Urbana-Champaign, USA

Scholarship Chairs

Luc Paquette (he/him) – University of Illinois Urbana-Champaign, USA
Ryan S. Baker – University of Pennsylvania, USA

Student Volunteer Chair

Sherry Sahebi (she/her) – University at Albany – SUNY, USA

Oleksandra Poquet – Technical University of Munich, Germany
Jill-Jênn Vie (he/him) – Inria Saclay, France

Web Chairs

Paul Salvador Inventado (he/him) – California State University Fullerton, USA
Ramkumar Rajendran (he/him) – Indian Institute of Technology Bombay, India
Rafael D. Araújo (he/him) – Federal University of Uberlândia, Brazil

Proceedings Chairs

Mirko Marras (he/him) – University of Cagliari, Italy
Maomi Ueno (he/him) – The University of Electro-Communications, Japan

Sponsorship Chairs

Hannah Moon (she/her) – Georgia Institute of Technology, USA

Online Experience Chair

Nick Lytle (he/him) – Georgia Institute of Technology, USA

Local Organizing Team

David Joyner – Georgia Institute of Technology, USA
Hannah Moon (she/her) – Georgia Institute of Technology, USA
Nick Lytle (he/him) – Georgia Institute of Technology, USA
Alex Duncan (he/him) – Georgia Institute of Technology, USA

IEDMS Officers

Tiffany Barnes,	President	North Carolina State University, USA
Anna Rafferty,	Treasurer	Carleton College, USA

IEDMS Board of Directors

Ryan Baker	University of Pennsylvania, USA
Neil Heffernan	Worcester Polytechnic Institute, USA
Sharon Hsiao	Santa Clara University, USA
Tanja Käser	EPFL, CH
Kenneth Koedinger	Carnegie Mellon University, USA
Kalina Yacef	University of Sydney, Australia

Senior Program Committee Members


Bita Akram	North Carolina State University
Giora Alexandron	Weizmann Institute of Science
Roger Azevedo	University of Central Florida
Ryan Baker	University of Pennsylvania
Tiffany Barnes	North Carolina State University
Gautam Biswas	Vanderbilt University
Ig Ibert Bittencourt	Federal University of Alagoas
Nigel Bosch	University of Illinois Urbana-Champaign
Anthony F. Botelho	University of Florida
François Bouchet	Sorbonne Université - LIP6
Alex Bowers	Columbia University
Min Chi	BeiKaZhouLi
Anat Cohen	Tel-Aviv University
Cristina Conati	The University of British Columbia
Linda Corrin	Deakin University
Alexandra Cristea	Durham University
Michel Desmarais	Ecole Polytechnique de Montreal
Fabiano Dorça	Universidade Federal de Uberlandia
Michael Eagle	George Mason University
Vanessa Echeverria	Monash University
Yo Ehara	Tokyo Gakugei University
Mingyu Feng	WestEd
Carol Forsyth	Educational Testing Service
Kobi Gal	The University of Edinburgh
Praveen Garimella	International Institute of Information Technology
Dragan Gasevic	Monash University
Neil Heffernan	Worcester Polytechnic Institute
Sharon Hsiao	Santa Clara University
Xiao Hu	The University of Hong Kong
Paul Salvador Inventado	California State University Fullerton
Johan Jeuring	Utrecht University
Srecko Joksimovic	Education Future, University of South Australia
Jelena Jovanovic	University of Belgrade
Tanja Käser	EPFL
Enkelejda Kasneci	Technical University of Munich
Hiroaki Kawashima	University of Hyogo
Kirsty Kitto	University of Technology, Sydney
Rene Kizilcec	Cornell University
Simon Knight	UTS
Kenneth Koedinger	Carnegie Mellon University
Irena Koprinska	The University of Sydney
Andrew Lan	University of Massachusetts Amherst
Mirko Marras	University of Cagliari
Roberto Angel Melendez	Instituto Tecnologico Superior de Misantla
Agathe Merceron	Berliner Hochschule für Technik - Berlin State University of Applied Sciences
Tanja Mitrovic	Intelligent Computer Tutoring Group, University of Canterbury, Christchurch
Roger Nkambou	Université du Québec à Montréal
Andrew Olney	University of Memphis
Ranilson Paiva	Universidade Federal de Alagoas
Luc Paquette	University of Illinois at Urbana-Champaign
Zach Pardos	University of California, Berkeley
Radek Pelánek	Masaryk University Brno
Thomas Price	North Carolina State University
Anna Rafferty	Carleton College
R Rajalakshmi	VIT University, Chennai Campus
Ramkumar Rajendran	IIT Bombay
Steven Ritter	Carnegie Learning, Inc.
Maria Mercedes T. Rodrigo	Department of Information Systems and Computer Science, Ateneo de Manila University
Cristobal Romero	Department of Computer Sciences and Numerical Analysis
Sherry Sahebi	University at Albany - SUNY
Demetrios Sampson	Curtin University
Olga Santos	UNED
Avi Segal	Ben Gurion University
Niels Seidel	FernUniversität in Hagen
Atsushi Shimada	Kyushu University
Sergey Sosnovsky	Utrecht University
Tiffany Tang	Wenzhou-Kean University
Jill-Jênn Vie	Inria Lille
Lanqin Zheng	Beijing Normal University

Program Committee Members

Mark Abdelshiheed	University of Colorado Boulder
Faruk Ahmed	The University of Memphis
Nazia Alam	North Carolina State University
Laia Albó	Universitat Pompeu Fabra
Laura Allen	University of Minnesota
Isaac Alpizar Chacon	Utrecht University
Mohammad Alshehri	Durham University
Alvaro Alvares	Federal University of the Agreste of Pernambuco
Gisele Arevalo	University of Alberta
Simón Pedro Arguijo	Tecnológico Nacional de México campus Misantla
T.S. Ashwin	Vanderbilt University
Ayan Banerjee	Arizona State University
Denilson Barbosa	University of Alberta
Abhinava Barthakur	University of South Australia
Prateek Basavaraj	American Association of State Colleges and Universities
Marie Bexte	FernUniversität in Hagen
Anis Bey	La Rochelle University
Plaban Kumar Bhowmick	Indian Institute of Technology Kharagpur
Nathaniel Blanchard	Pontificia Universidad Católica de Esmeraldas
Maria Bolsinova	Tilburg University
Conrad Borchers	Carnegie Mellon University
Jesus G. Boticario	UNED
Marie-Luce Bourguet	Queen Mary London
Matthieu Brinkhuis	Utrecht University
Ted Briscoe	University of Cambridge
Julien Broisin	IRIT, Université Toulouse III - Paul Sabatier, Toulouse, France
Minghao Cai	University of Alberta
Renza Campagni	Università degli Studi di Firenze
Jie Cao	University of Colorado Boulder
Meng Cao	University of Memphis
Nicolás Cardozo	Universidad de los Andes
Paulo Carvalho	Carnegie Mellon University
Guanliang Chen	Monash University
Heeryung Choi	Massachusetts Institute of Technology
Wei Chu	The University of Memphis
Cheng-Yu Chung	National Yang Ming Chiao Tung University
Ruth Cobos	Universidad Autónoma de Madrid
Aubrey Condor	University of California Berkeley
Maria de Los Angeles Constantino González	Tecnológico de Monterrey Campus Laguna
Maria Cutumisu	McGill University
Jesper Dannath	Universität Bielefeld
Syaamantak Das	Indian Institute of Technology Bombay
Alina Deriyeva	Universtity of Bielefeld
M Ali Akber Dewan	Athabasca University
Nicholas Diana	Colgate University
Fahima Djelil	IMT Atlantique
Mohsen Dorodchi	University of North Carolina Charlotte
Cristina Dumdumaya	University of Southeastern Philippines
Nghia Duong-Trung	German Research Centre for Artificial Intelligence
Luke Eglington	Amplify
Yo Ehara	Tokyo Gakugei University
Samira Elatia	University of Alberta
Fahmid Morshed Fahid	North Carolina State University
Yizhou Fan	Peking University
Stephen Fancsali	Carnegie Learning, Inc.
Effat Farhana	Vanderbilt University
Márcia Fernandes	Federal University of Uberlândia
Nigel Fernandez	University of Massachusetts Amherst
Jeremiah Folsom-Kovarik	Soar Technology, Inc.
Kazuma Fuchimoto	The University of Electro-Communications
Hagit Gabbay	School of Education, Tel Aviv University
Kobi Gal	The University of Edinburgh
Wenbin Gan	National Institute of Information and Communications Technology
Mark Gierl	University of Alberta
Aldo Gordillo	Universidad Politécnica de Madrid (UPM)
Guher Gorgun	University of Alberta
Art Graesser	University of Memphis
Sabine Graf	Athabasca University
Monique Grandbastien	LORIA, Universite de Lorraine
Julio Guerra	Universidad Austral de Chile
Ella Haig	School of Computing, University of Portsmouth
Ching Nam Hang	City University of Hong Kong
Jiangang Hao	Educational Testing Service
Ellie Hajarian	Athabasca University
Jason Harley	McGill University
Erik Harpstead	Carnegie Mellon University
Fatima Harrak	Sorbonne Université - LIP6
Carl Haynes-Magyar	Carnegie Mellon University
Surina He	University of Alberta
Sami Heikkinen	LAB University of Applied Sciences
Arto Hellas	Aalto University
Erik Hemberg	ALFA
Nicolas Hernandez	Nantes Université - LS2N CNRS UMR 6004
Martin Hlosta	The Swiss Distance University of Applied Sciences
Anett Hoppe	TIB Leibniz Information Centre for Science and Technology; L3S Research Centre, Leibniz Universität Hannover
Lingyun Huang	The Education University of Hong Kong
Yun Huang	Austral University of Chile
Paul Hur	University of Illinois at Urbana-Champaign
Sébastien Iksal	LIUM - Le Mans Université, France
Vladimir Ivančević	University of Novi Sad, Faculty of Technical Sciences
Hyeji Jang	Ewha Womans University
Emily Jensen	University of Colorado Boulder
Lan Jiang	University of Illinois Urbana-Champaign
David Joyner	Georgia Institute of Technology
Jina Kang	University of Illinois Urbana-Champaign
Tanja Käser	EPFL
Mohammad Khalil	University of Bergen
Ekaterina Kochmar	MBZUAI
Elizabeth Koh	National Institute of Education, Nanyang Technological University, Singapore
Sotiris Kotsiantis	University of Patras
Vitomir Kovanovic	The University of South Australia
Milos Kravcik	DFKI GmbH
Swathi Krishnaraja	University of Potsdam
Roland Kuhn	National Research Council of Canada
Amruth Kumar	Ramapo College of New Jersey
Vive Kumar	Athabasca University
Vishal Kuvar	University of Minnesota
Hollis Lai	University of Alberta
Sébastien Lallé	Sorbonne University
Andrew Lan	University of Massachusetts Amherst
Juan Alfonso Lara Torralbo	University of Córdoba
Mikel Larrañaga	University of the Basque Country
Elise Lavoué	iaelyon, Université Jean Moulin Lyon 3, LIRIS
Tai Le Quy	IU International University of Applied Sciences
Vwen Yen Alwyn Lee	Nanyang Technological University
Marie Lefevre	LIRIS - Université Lyon 1
Juho Leinonen	Aalto University
Arun Balajiee Lekshmi Narayanan	University of Pittsburgh
James Lester	North Carolina State University
Chenglu Li	University of Florida
Jiawei Li	Nanyang Technological University
Warren Li	University of Michigan
Jionghao Lin	Carnegie Mellon University
Qi Liu	University of Science and Technology of China
Zhexiong Liu	University of Pittsburgh
Sonsoles López-Pernas	University of Eastern Finland
Yu Lu	Beijing Normal University
Vanda Luengo	Sorbonne Université - LIP6
Ivan Luković	University of Belgrade, Faculty of Organizational Sciences
Collin Lynch	North Carolina State University
Nick Lytle	Georgia Tech Univeristy
Boxuan Ma	Kyshu University
Qiang Ma	Kyoto Institute of Technology
Qianou Ma	Carnegie Mellon University Human-Computer Interaction Institute
Jeffrey Matayoshi	McGraw Hill ALEKS
Madeth May	University of Maine
Gord McCalla	University of Saskatchewan
Emma McDonald	University of Alberta
Guilherme Medeiros Machado	ECE Paris
Victor Menendez-Dominguez	Universidad Autónoma de Yucatán
Donatella Merlini	Università di Firenze
Caitlin Mills	University of Minnesota
Tsunenori Mine	Kyushu University
Tsubasa Minematsu	Kyushu University
Sein Minn	INRIA
Phaedra Mohammed	The University of the West Indies
Luis Alberto Morales Rosales	Conacyt-Universidad Michoacana de San Nicolás de Hidalgo
Matthew Moreno	McGill University
Pedro Manuel Moreno-Marcos	Universidad Carlos III de Madrid
Bradford Mott	North Carolina State University
Kousuke Mouri	Tokyo University of Agriculture and Technology
Calarina Muslimani	University of Alberta
Tanya Nazaretsky	EPFL
Huy Nguyen	University of Pittsburgh
Narges Norouzi	Berkeley
Ange Adrienne Nyamen Tato	École de Technologie Supérieure
Teresa Ober	Educational Testing Service
Püren Öncel	University of Minnesota Twin Cities
Tounwendyam Frédéric Ouedraogo	Université Norbert ZONGO
Maciej Pankiewicz	Warsaw University of Life Sciences
Yeonjeong Park	Honam University
Philip Pavlik	University of Memphis
Jorge Poco	Fundação Getulio Vargas
Paul Stefan Popescu	University of Craiova
Oleksandra Poquet	Technical University of Munich
Ethan Prihar	École polytechnique fédérale de Lausanne
David Pritchard	Massachusetts Institute of Technology
Miroslava Raspopović	Faculty of Information Technology
Narjes Rohani	University of Edinburgh
José Raúl Romero	University of Cordoba
Daniela Rotelli	University of Pisa
Mirka Saarela	University of Jyväskylä
Maria Ofelia San Pedro	Roblox
Sreecharan Sankaranarayanan	Carnegie Mellon University
Mohammed Saqr	University of Eastern Finland
Petra Sauer	beuth university of applied sciences
Robin Matthias Schmucker	Carnegie Mellon University
Filippo Sciarrone	Universitas Mercatorum
Kazuhisa Seta	Osaka Metropolitan University
Lele Sha	Monash University
Lei Shi	Newcastle University
Yang Shi	Utah State University
Jinnie Shin	University of Florida
Aditi Singh	Cleveland State University
Daevesh Singh	Indian Institute of Technology
Stefan Slater	Teachers College
Juyeong Song	Ewha Womans University
Frank Stinar	University of Illinois - Urbana Champaign
Vinitra Swamy	EPFL
Anaïs Tack	KU Leuven
Ling Tan	Australian Council for Educational Research
Michelle Taub	University of Central Florida
Daniela Teodorescu	LMU Munich
Craig Thompson	The University of British Columbia
Emiko Tsutsumi	The University of Electro-Communications
Maomi Ueno	The University of Electro-Communications
Maya Usher	Technion
Masaki Uto	The University of Electro-Communications
Sowmya Vajjala	National Research Council, Canada
José Antonio Hiram Vázquez-López	Tecnológico Nacional de México, campus Instituto Tecnológico Superior de Misantla
Oswaldo Velez-Langs	Universidad de Cordoba
Rémi Venant	Le Mans Université - LIUM
Olga Viberg	KTH Royal Institute of Technology
Markel Vigo	The University of Manchester
Alessandro Vivas	UFVJM
Tuyet-Trinh Vu	SOICT-HUST
Deliang Wang	The University of Hong Kong
Zichao Wang	Rice University
Christabel Wayllace	New Mexico State University
Daniel Weitekamp	Carnegie Mellon University
Jacob Whitehill	Worcester Polytechnic Institute
Alistair Willis	The Open University
Aaron Wong	University of Minnesota
Chris Wong	University of Technology Sydney
Jacqueline Wong	Utrecht University
Beverly Park Woolf	University of Massachusetts
Yi-jung Wu	University of Wisconsin-Madison
Peter Wulff	Heidelberg University of Education
Jia Xu	Guangxi University
Elad Yacobson	Weizmann Institue of Science
Seyma Yildirim-Erbasli	Concordia University of Edmonton
Chengjiu Yin	Kyushu University
Andrew Zamecnik	University of South Australia
Diego Zapata-Rivera	Educational Testing Service
Jiayi Zhang	University of Pennsylvania
Lisa Zhang	University of Toronto
Wenbin Zhang	Florida International University
Yingbin Zhang	South China Normal University
Lanqin Zheng	Beijing Normal University
Stefano Zingaro	Università di Bologna

Keynotes

Generalizable and Interpretable Models of Learning

Tanja Käser, EPFL School of Computer and Communication Sciences

Modeling learners’ knowledge, behavior, and strategies is at the heart of educational technology. Learner models serve as a basis for adapting the learning experience to students’ needs and supporting teachers in classroom orchestration. Consequently, a large body of research has focused on creating accurate models of student knowledge and behaviors. However, current modeling approaches are still limited: they are either defined for specific and well-structured domains (e.g., algebra, vocabulary learning) requiring substantial work from experts and limiting generalizability, or they lack interpretability. Recent advances in generative AI, in particular large language models (LLMs), have the potential to address these constraints. However, LLMs lack alignment with educational goals and a grounded knowledge.

In this talk, I will discuss the key challenges in developing generalizable and explainable models, and our solutions to address them, including models tracking learning in open-ended environments and generalizing between different environments and populations. I will present our work on explainable AI, including a rigorous evaluation of existing approaches, the development of inherently interpretable models, as well as studies on effectively communicating model explanations. Finally, I will show some of our recent results combining “traditional” modeling approaches and LLMs to provide interpretable feedback and explanations while not compromising on model trustworthiness.

Opportunities and Challenges for LLM Agent-Based Support for Collaborative Design

Carolyn Rosé, Professor of Language Technologies and Human-Computer Interaction, School of Computer Science, Carnegie Mellon, USA

Supporting collaborative design is an ideal context for exploring the capabilities and limitations of LLM-based conversational agents. The ability to extract information in context and produce a coherent sounding text can be used to generate reflection triggers. In two recent studies, we have employed LLM-based conversational agents with the goal of triggering human reflection and learning during collaborative software design. As humans engage in collaborative design, they employ their own abilities to reason abstractly, to decompose problems, and apply principles productively. Reflection is a valuable activity for promoting human learning in these settings. However, what humans are able to do in terms of abstraction and reasoning as part of their creative problem solving is precisely what is most difficult for LLM agents to do. In contrast to claims of “super-human performance” in the media, in this talk we will explore the complementarity of human intelligence and Artificial Intelligence. We will begin with results of a classroom study where LLM-based conversational agent support for collaborative software development was successful in increasing student learning. From there we will move on to argue in favor of a research agenda for exploiting the complementarity both in terms of applying AI capabilities to the betterment of human learning as well as inspiring further extension of technical capabilities from insights derived from observation of human reflection and learning in collaborative design.

Prof. Ram Kumar Educational Data Mining Test of Time Award Talk

Assessing Student Learning in Open Ended Learning Environments From Sequential to Multimodal Data Analysis

Gautam Biswas, Professor of Computer Science, Vanderbilt University, Nashville, TN. USA

From my early days as an AIED and EDM researcher, I have focused on understanding how students learn, especially in scenarios where they have to construct and apply their knowledge to problem-solving tasks. Collaborating with peers, we developed open-ended learning environments (OELEs) where K-12 students build scientific models and apply them to solve real-world problems. Challenges arise, as students have to navigate with multiple tools in the computer-environments. Some students overcome these challenges to become effective learners while others struggle to progress often applying suboptimal learning strategies. John Kinnebrew and I began analyzing learners’ activity logs to study these differences, resulting in the Differential Sequence Mining algorithm, which earned us the best paper award at EDM 2012. Expanding on this work, we developed the Contextualized Difference Mining method for understanding students’ learning behaviors, for which we are receiving this Test of Time award.

In my talk, I will review our work on Differential sequence mining, and explore its applications in understanding students’ cognitive and metacognitive learning behaviors. We have leveraged these insights to provide adaptive support, helping students’ progress in our Open-Ended Learning Environments (OELEs). Beyond this, we have employed other sequential representations, such as Markov Chains and Hidden Markov Models, to analyze students’ activities and behaviors in the context of their learning and problem-solving tasks. Beyond this, our OELEs have advanced to facilitate students’ integrated learning of science, computing, and engineering problem-solving, including collaborative efforts in computer-based and embodied learning scenarios. From these richer learning environments, I will share insights into our latest efforts involving multimodal data analysis, incorporating video, speech, and activity logs. Using vision-based deep learning models and large language models (LLMs), we integrate analyses across modalities, offering a comprehensive understanding of students’ collaborative learning and problem-solving activities. In conclusion, I will discuss the potential implications of our work on shaping future of learning in classrooms.

Update: Next year's winner of the Prof. Ram Kumar Educational Data Mining Test of Time Award

Ryan S.J.d. Baker, Sujith M. Gowda, Michael Wixon, Jessica Kalka, Angela Z. Wagner, Aatish Salvi, Vincent Aleven, Gail W. Kusbit, Jaclyn Ocumpaugh, and Lisa Rossi for their paper Towards Sensor-Free Affect Detection in Cognitive Tutor Algebra.

Initially published in Educational Data Mining 2012. Ryan Baker will give the award talk at the 18th Educational Data Mining Conference (EDM 2025).

EDM data set award

Jakub Kužílek

Senior Researcher, Computer Science Education / Computer Science and Society research group, Humboldt-Universität zu Berlin & German Research Center for Artificial Intelligence, Berlin, Germany.

Bio. Jakub Kužílek is affiliated with the Computer Science Education / Computer Science and Society research group at Humboldt-Universität zu Berlin and the Educational Technology Lab at the German Research Center for Artificial Intelligence (DFKI) as a senior researcher. His research investigates student self-regulated learning within online learning environments, collaborative group work, adaptive assessments, and feedback within the context of digital education. In the past, he developed (together with Martin Hlosta and Zdenek Zdrahal) an OU Analyse system used to support 200.000 students of the Open University (United Kingdom) during their studies and founded learning analytics research at Czech Technical University. He has led (and currently is doing research within) a project on AI use in assessment feedback at Humbold-Universität. In parallel, he is leading the project on AI-driven recommendation systems in vocational education (KIPerWeb at DFKI).

Martin Hlosta

Anand Deshpande, Founder and Chairman, Persistent Systems, IN

Bio. Martin Hlosta is a Senior Researcher at the Institute for Distance Learning and eLearning Research (IFeL) at Swiss distance university of applied sciences (FFHS). Before joining FFHS, he led research and development of OUAnalyse at The Open University (OU) – a Predictive Learning Analytics project deployed in all undergraduate courses, improving student retention and teachers practice. It is one of the world-largest deployment of analytics systems in education and in 2020 it was selected by UNESCO as one of the four best projects using AI in education. His following work focused on identifying factors contributing to large gaps of disadvantaged students in the UK, and in another study presented how using predictive analytics by teachers in an online course can lower these gaps for students coming from low Socio-Economic areas. Currently, he is leading research and teaching in Learning Analytics at FFHS and works on various strands how learning analytics can improve feedback. His most recent project funded by Unity and Meta to target inequalities in education explores how immersive Virtual Reality and enhanced analytics for reflection can help future teachers in South Africa.

JEDM Presentations

The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data

Yang Shi	NC State University	yshi26@ncsu.edu
Robin Schmucker	Carnegie Mellon University	rschmuck@cs.cmu.edu
Keith Tran	NC State University	ktran24@ncsu.edu
John Bacher	NC State University	jtbacher@ncsu.edu
Kenneth Koedinger	Carnegie Mellon University	koedinger@cmu.edu
Thomas Price	NC State University	twprice@ncsu.edu
Min Chi	NC State University	mchi@ncsu.edu
Tiffany Barnes	NC State University	tmbarnes@ncsu.edu

Understanding students’ learning of knowledge components (KCs) is an important educational data mining task and enables many educational applications. However, in the domain of computing education, where program exercises require students to practice many KCs simultaneously, it is a challenge to attribute their errors to specific KCs and, therefore, to model student knowledge of these KCs. In this paper, we define this task as the KC attribution problem. We first demonstrate a novel approach to addressing this task using deep neural networks and explore its performance in identifying expert-defined KCs (RQ1). Because the labeling process takes costly expert resources, we further evaluate the effectiveness of transfer learning for KC attribution, using more easily acquired labels, such as problem correctness (RQ2). Finally, because prior research indicates the incorporation of educational theory in deep learning models could potentially enhance model performance, we investigated how to incorporate learning curves in the model design and evaluated their performance (RQ3). Our results show that in a supervised learning scenario, we can use a deep learning model, code2vec, to attribute KCs with a relatively high performance (AUC > 75% in two of the three examined KCs). Further using transfer learning, we achieve reasonable performance on the task without any costly expert labeling. However, the incorporation of learning curves shows limited effectiveness in this task. Our research lays important groundwork for personalized feedback for students based on which KCs they applied correctly, as well as more interpretable and accurate student models.

Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback

Jacob Whitehill	Worcester Polytechnic Institute	jrwhitehill@wpi.edu
Jennifer LoCasale-Crouch	Virginia Commonwealth University	locasalecrj@vcu.edu

With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate “Instructional Support” domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol. We design a machine learning architecture that uses either zero-shot prompting of Meta’s Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers’ speech (transcribed automatically using OpenAI’s Whisper) for the presence of Instructional Support. Then, these utterance-level judgments are aggregated over a 15-min observation session to estimate a global CLASS score. Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson R up to 0.48) approaches human inter-rater reliability (up to R = 0.55); (2) LLMs generally yield slightly greater accuracy than BoW for this task, though the best models often combined features extracted from both LLM and BoW; and (3) for classifying individual utterances, there is still room for improvement of automated methods compared to human-level judgments. Finally, (4) we illustrate how the model’s outputs can be visualized at the utterance level to provide teachers with explainable feedback on which utterances were most positively or negatively correlated with specific CLASS dimensions.

An Approach to Improve k-Anonymization Practices in Educational Data Mining

Frank Stinar	University of Illinois Urbana–Champaign	fstinar2@illinois.edu
Zihan Xiong	University of Pennsylvania	zihanx3@seas.upenn.edu
Nigel Bosch	University of Illinois Urbana–Champaign	pnb@illinois.edu

With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, Educational data mining has allowed for large improvements in educational outcomes and understanding of educational processes. However, there remains a constant tension between educational data mining advances and protecting student privacy while using educational datasets. Publicly available datasets have facilitated numerous research projects while striving to preserve student privacy via strict anonymization protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization and utility of educational datasets for downstream educational data mining tasks, nor how anonymization processes might be improved for such tasks. We provide a framework for strictly anonymizing educational datasets with a focus on improving downstream performance in common tasks such as student outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with machine learning-based downstream task examples to demonstrate both the effect of anonymization and our means to improve it. Our method improves downstream machine learning accuracy versus baseline data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that anonymize the least important information while leaving the most valuable information intact.

Exploring the Impact of Symbol Spacing and Problem Sequencing on Arithmetic Performance: An Educational Data Mining Approach

Avery Harrison Closser	Purdue University	aclosser@purdue.edu
Anthony F. Botelho	University of Florida	abotelho@coe.ufl.edu
Jenny Yun-Chen Chan	The Education University of Hong Kong	chanjyc@eduhk.hk

Experimental research on perception and cognition has shown that inherent and manipulated visual features of mathematics problems impact individuals’ problem-solving behavior and performance. In a recent study, we manipulated the spacing between symbols in arithmetic expressions to examine its effect on 174 undergraduate students’ arithmetic performance but found results that were contradictory to most of the literature (Closser et al., 2023). Here, we applied educational data mining (EDM) methods to that dataset at the problem level to investigate whether inherent features of the 32 experimental problems (i.e., problem composition, problem order) may have caused unintended effects on students’ performance. We found that students were consistently faster to correctly simplify expressions with the higher-order operator on the left, rather than right, side of the expression. Furthermore, average response times varied based on the symbol spacing of the current and preceding problem, suggesting that problem sequencing matters. However, including or excluding problem identifiers in analyses changed the interpretation of results, suggesting that the effect of sequencing may be impacted by other, undefined problem-level factors. These results advance cognitive theories on perceptual learning and provide implications for educational researchers: online experiments designed to investigate students’ performance on mathematics problems should include a variety of problems, systematically examine the effects of problem order, and consider applying different data analysis approaches to detect effects of inherent problem features. Moreover, EDM methods can be a tool to identify nuanced effects on behavior and performance in the context of data from online platforms.

Effect of Gamification on Gamers: Evaluating Interventions for Students Who Game the System

Kirk Vanacore	Worcester Polytechnic Institute	kpvanacore@wpi.edu
Ashish Gurung	Carnegie Mellon University	agurung@andrew.cmu.edu
Adam Sales	Worcester Polytechnic Institute	asales@wpi.edu
Neil Heffernan	Worcester Polytechnic Institute	nth@wpi.edu

Gaming the system is a persistent problem in Computer-Based Learning Platforms. While substantial progress has been made in identifying and understanding such behaviors, effective interventions remain scarce. This study explores the impact of two types of interventions – gamification and manipulation of assistance access – on the learning outcomes of students who tend to game the system using a method of causal moderation known as Fully Latent Principal Stratification. The results indicate that gamification does not consistently mitigate these negative behaviors. One gamified condition had a consistently positive effect on learning regardless of students’ propensity to game the system, whereas the other had a negative effect on such students. However, delaying access to hints and feedback may have a positive effect on the learning outcomes of those gaming the system. This paper also illustrates the potential of integrating detection and causal methodologies within education data mining for understanding how to respond to behaviors effectively after they are detected.

LearnSphere: A Learning Data and Analytics Cyberinfrastructure

John Stamper	Carnegie Mellon University	jstamper@cmu.edu
Philip I. Pavlik Jr.	University of Memphis	ppavlik@memphis.edu
Steven Moore	Carnegie Mellon University	stevenmo@andrew.cmu.edu
Kenneth Koedinger	Carnegie Mellon University	koedinger@cmu.edu

LearnSphere is a web-based data infrastructure designed to transform scientific discovery and innovation in education. It supports learning researchers in addressing a broad range of issues including cognitive, social, and motivational factors in learning, educational content analysis, and educational technology innovation. LearnSphere integrates previously separate educational data and analytic resources developed by participating institutions. The web-based workflow authoring tool, Tigris, allows technical users to contribute sophisticated analytic methods, and learning researchers can adapt and apply those methods using graphical user interfaces, importantly, without additional programming. As part of our use-driven design of LearnSphere, we built a community through workshops and summer schools on educational data mining. Researchers interested in particular student levels or content domains can find student data from elementary through higher-education and across a wide variety of course content such as math, science, computing, and language learning. LearnSphere has facilitated many discoveries about learning, including the importance of active over passive learning activities and the positive association of quality discussion board posts with learning outcomes. LearnSphere also supports research reproducibility, replicability, traceability, and transparency as researchers can share their data and analytic methods along with links to research papers. We demonstrate the capabilities of LearnSphere through a series of case studies that illustrate how analytic components can be combined into research workflow combinations that can be developed and shared. We also show how open web-accessible analytics drive the creation of common formats to streamline repeated analytics and facilitate wider and more flexible dissemination of analytic tool kits.

Session-based Methods for Course Recommendation

Md Akib Zabed Khan	Florida International University	mkhan149@fiu.edu
Agoritsa Polyzou	Florida International University	apolyzou@fiu.edu

In higher education, academic advising is crucial to students’ decision-making. Data-driven models can benefit students in making informed decisions by providing insightful recommendations for completing their degrees. To suggest courses for the upcoming semester, various course recommendation models have been proposed in the literature using different data mining techniques and machine learning algorithms utilizing different data types. One important aspect of the data is that usually, courses taken together in a semester fit well with each other. If there is no correlation between the co-taken courses, students may find it more difficult to handle the workload. Based on this insight, we propose using session-based approaches to recommend a set of well-suited courses for the upcoming semester. We test three session-based course recommendation models, two based on neural networks (CourseBEACON and CourseDREAM) and one on tensor factorization (TF-CoC). Additionally, we propose a postprocessing approach to adjust the recommendation scores of any base course recommender to promote related courses. Using metrics capturing different aspects of the recommendation quality, our experimental evaluation shows that session-based methods outperform existing popularity-based, association-based, similarity-based, factorization-based, neural networks-based, and Markov chain-based recommendation approaches. Effective course recommendations can result in improved student advising, which, in turn, can improve student performance, decrease dropout rates, and a more positive overall student experience and satisfaction.

Best Paper AIED 2023 Presentation

Confusion, Conflict, Consensus: Modeling Dialogue Processes During Collaborative Learning with Hidden Markov Models

Toni V. Earle-Randell	University of Florida
Joseph B. Wiggins	University of Florida
Julianna Martinez Ruiz	University of Florida
Mehmet Celepkolu	University of Florida
Kristy Elizabeth Boyer	University of Florida
Collin F. Lynch	North Carolina State University
Maya Israel	University of Florida
Eric Wiebe	North Carolina State University

There is growing recognition that AI technologies can, and should, support collaborative learning. To provide this support, we need models of collaborative talk that reflect the ways in which learners interact. Great progress has been made in modeling dialogue for high school and college-age learners, but the dialogue processes that characterize collaborative talk between elementary learner dyads are not currently well understood. This paper reports on a study with elementary school learners (4th and 5th grade, ages 9–11 years old) coded collaboratively in dyads. We recorded dialogue from 22 elementary school learner dyads, covering 7594 total utterances. We labeled this corpus manually with dialogue acts and then induced a hidden Markov model to identify the underlying dialogue states and the transitions between these states. The model identified six distinct hidden states which we interpret as Social Dialogue, Confusion, Frustrated Coordination, Exploratory Talk, Directive & Disagreement, and Disagreement & Self-Explanation. The HMM revealed that when students entered into a productive exploratory talk state, the primary way they transitioned out of this state is when they became confused or reached an impasse. When this occurred, the learners then moved into states of disputation and conflict before re-entering the Exploratory Talk state. These findings can inform the design of AI agents who support young learners’ collaborative talk and help agents determine when students are conflicting rather than collaborating.