Introduction to the Proceedings

Preface

The Georgia Institute of Technology is proud to host the seventeenth International Conference on Educational Data Mining (EDM) in Atlanta, Georgia, July 14-July 17, 2024. EDM is the annual flagship conference of the International Educational Data Mining Society. This year’s theme is “New tools, new prospects, new risks – educational data mining in the age of generative AI.” The theme focuses on the movement from descriptive and predictive models to generative artificial intelligence (AI) and what that means for learning environments and processes. While the new methods unlock exciting new potentials for educational data mining, they also foreground many ethical considerations and risks that are associated with all types of machine learning and artificial intelligence. This year, we additionally welcomed research in the following areas: mitigating biases and harms that may result from model use, accounting for the stereotypes that are inherent to the large models that drive generative AI, separating the hype surrounding these new technologies from their potential in educational settings, and finding ways to use these models to better understand learning processes and support learning.

The scientific programming for EDM2024 includes

The tutorials are: (1) Logistic Knowledge Tracing Tutorial, (2) Promoting Open Science in Educational Data Mining, (3) Thinking Causally in EDM, (4) Beyond Tutor Logs: Utilizing Sensor Data for Measuring Student Behavior, and (5) Tools for Planning and Analyzing Randomized Controlled Trials and A/B Tests. The workshops are: (1) 8th Educational Data Mining in Computer Science Education (CSEDM) Workshop, (2) Human-Centric eXplainable AI in Education (HEXED) Workshop, (3) Causal Inference in Educational Data Mining, (4) Leveraging Large Language Models for Next-Generation Educational Technologies, and (5) Educational Data Mining in Writing and Literacy Instruction.

The venue is the Global Learning Center of Georgia Tech, home of Georgia Tech’s Division of Lifetime Learning. The Global Learning Center is nestled in Tech Square, a burgeoning tech hub in Midtown Atlanta attached to the Georgia Tech Hotel & Conference Center and across the street from the Tech Square Research Building and the Coda Building, home of many of the faculty of the institute’s College of Computing. EDM 2024 received 81 submissions to the full papers track (10 pages), 68 to the short papers track (6 pages), and 29 to the poster and demo track (4 pages). The program committee accepted 21 full papers (26 The EDM 2024 industry track and the industry panel fostered exchange between industry application research and basic research. Four papers and two posters were included in the industry track. Panelists included Kristen DiCerbo (Khan Academy), Diego Zapata-Rivera (ETS Research Institute), Lewis Johnson (Alelo), and Bibi Groot (EeDI). EDM 2024 also continued its tradition of providing opportunities for young researchers to present their work and receive feedback from their peers and senior researchers. The doctoral consortium this year features 12 such participants. EDM 2024 is especially proud to offer travel sponsorships to 15 students who will attend EDM thanks to this support. We thank the sponsors of EDM 2024 for their generous support. We thank all the authors who submitted their work and the program committee members for their expert inputs. We thank the members of the organization committee for their leadership that made this conference possible. And, a big Thank You to the local organizing committee who made this event memorable.

Carrie Demmans Epp University of Alberta, Canada Program Chair
Benjamin Paaßen Bielefeld University, Germany Program Chair
David Joyner Georgia Institute of Technology, USA General Chair

July 14th, 2024
Atlanta, GA, USA

Organizing Committee

General Chairs

Program Chairs

Equity, Diversity, and Inclusion Chairs

Accessibility Chairs

Industry Track Chairs

Poster & Demo Track Chairs

Doctoral Consortium Chairs

JEDM Track Chairs

Workshop & Tutorials Chairs

Awards Chairs

Scholarship Chairs

Student Volunteer Chair

Social Media & Publicity Chairs

Web Chairs

Proceedings Chairs

Sponsorship Chairs

Online Experience Chair

Local Organizing Team

IEDMS Officers

Tiffany Barnes, President North Carolina State University, USA
Anna Rafferty, Treasurer Carleton College, USA

IEDMS Board of Directors

Ryan Baker University of Pennsylvania, USA
Neil Heffernan Worcester Polytechnic Institute, USA
Sharon Hsiao Santa Clara University, USA
Tanja Käser EPFL, CH
Kenneth Koedinger Carnegie Mellon University, USA
Kalina Yacef University of Sydney, Australia

Senior Program Committee Members

Bita Akram North Carolina State University
Giora Alexandron Weizmann Institute of Science
Roger Azevedo University of Central Florida
Ryan Baker University of Pennsylvania
Tiffany Barnes North Carolina State University
Gautam Biswas Vanderbilt University
Ig Ibert Bittencourt Federal University of Alagoas
Nigel Bosch University of Illinois Urbana-Champaign
Anthony F. Botelho University of Florida
François Bouchet Sorbonne Université - LIP6
Alex Bowers Columbia University
Min Chi BeiKaZhouLi
Anat Cohen Tel-Aviv University
Cristina Conati The University of British Columbia
Linda Corrin Deakin University
Alexandra Cristea Durham University
Michel Desmarais Ecole Polytechnique de Montreal
Fabiano Dorça Universidade Federal de Uberlandia
Michael Eagle George Mason University
Vanessa Echeverria Monash University
Yo Ehara Tokyo Gakugei University
Mingyu Feng WestEd
Carol Forsyth Educational Testing Service
Kobi Gal The University of Edinburgh
Praveen Garimella International Institute of Information Technology
Dragan Gasevic Monash University
Neil Heffernan Worcester Polytechnic Institute
Sharon Hsiao Santa Clara University
Xiao Hu The University of Hong Kong
Paul Salvador Inventado California State University Fullerton
Johan Jeuring Utrecht University
Srecko Joksimovic Education Future, University of South Australia
Jelena Jovanovic University of Belgrade
Tanja Käser EPFL
Enkelejda Kasneci Technical University of Munich
Hiroaki Kawashima University of Hyogo
Kirsty Kitto University of Technology, Sydney
Rene Kizilcec Cornell University
Simon Knight UTS
Kenneth Koedinger Carnegie Mellon University
Irena Koprinska The University of Sydney
Andrew Lan University of Massachusetts Amherst
Mirko Marras University of Cagliari
Roberto Angel Melendez Instituto Tecnologico Superior de Misantla
Agathe Merceron Berliner Hochschule für Technik - Berlin State University of Applied Sciences
Tanja Mitrovic Intelligent Computer Tutoring Group, University of Canterbury, Christchurch
Roger Nkambou Université du Québec à Montréal
Andrew Olney University of Memphis
Ranilson Paiva Universidade Federal de Alagoas
Luc Paquette University of Illinois at Urbana-Champaign
Zach Pardos University of California, Berkeley
Radek Pelánek Masaryk University Brno
Thomas Price North Carolina State University
Anna Rafferty Carleton College
R Rajalakshmi VIT University, Chennai Campus
Ramkumar Rajendran IIT Bombay
Steven Ritter Carnegie Learning, Inc.
Maria Mercedes T. Rodrigo Department of Information Systems and Computer Science, Ateneo de Manila University
Cristobal Romero Department of Computer Sciences and Numerical Analysis
Sherry Sahebi University at Albany - SUNY
Demetrios Sampson Curtin University
Olga Santos UNED
Avi Segal Ben Gurion University
Niels Seidel FernUniversität in Hagen
Atsushi Shimada Kyushu University
Sergey Sosnovsky Utrecht University
Tiffany Tang Wenzhou-Kean University
Jill-Jênn Vie Inria Lille
Lanqin Zheng Beijing Normal University

Program Committee Members

Mark Abdelshiheed University of Colorado Boulder
Faruk Ahmed The University of Memphis
Nazia Alam North Carolina State University
Laia Albó Universitat Pompeu Fabra
Laura Allen University of Minnesota
Isaac Alpizar Chacon Utrecht University
Mohammad Alshehri Durham University
Alvaro Alvares Federal University of the Agreste of Pernambuco
Gisele Arevalo University of Alberta
Simón Pedro Arguijo Tecnológico Nacional de México campus Misantla
T.S. Ashwin Vanderbilt University
Ayan Banerjee Arizona State University
Denilson Barbosa University of Alberta
Abhinava Barthakur University of South Australia
Prateek Basavaraj American Association of State Colleges and Universities
Marie Bexte FernUniversität in Hagen
Anis Bey La Rochelle University
Plaban Kumar Bhowmick Indian Institute of Technology Kharagpur
Nathaniel Blanchard Pontificia Universidad Católica de Esmeraldas
Maria Bolsinova Tilburg University
Conrad Borchers Carnegie Mellon University
Jesus G. Boticario UNED
Marie-Luce Bourguet Queen Mary London
Matthieu Brinkhuis Utrecht University
Ted Briscoe University of Cambridge
Julien Broisin IRIT, Université Toulouse III - Paul Sabatier, Toulouse, France
Minghao Cai University of Alberta
Renza Campagni Università degli Studi di Firenze
Jie Cao University of Colorado Boulder
Meng Cao University of Memphis
Nicolás Cardozo Universidad de los Andes
Paulo Carvalho Carnegie Mellon University
Guanliang Chen Monash University
Heeryung Choi Massachusetts Institute of Technology
Wei Chu The University of Memphis
Cheng-Yu Chung National Yang Ming Chiao Tung University
Ruth Cobos Universidad Autónoma de Madrid
Aubrey Condor University of California Berkeley
Maria de Los Angeles Constantino González Tecnológico de Monterrey Campus Laguna
Maria Cutumisu McGill University
Jesper Dannath Universität Bielefeld
Syaamantak Das Indian Institute of Technology Bombay
Alina Deriyeva Universtity of Bielefeld
M Ali Akber Dewan Athabasca University
Nicholas Diana Colgate University
Fahima Djelil IMT Atlantique
Mohsen Dorodchi University of North Carolina Charlotte
Cristina Dumdumaya University of Southeastern Philippines
Nghia Duong-Trung German Research Centre for Artificial Intelligence
Luke Eglington Amplify
Yo Ehara Tokyo Gakugei University
Samira Elatia University of Alberta
Fahmid Morshed Fahid North Carolina State University
Yizhou Fan Peking University
Stephen Fancsali Carnegie Learning, Inc.
Effat Farhana Vanderbilt University
Márcia Fernandes Federal University of Uberlândia
Nigel Fernandez University of Massachusetts Amherst
Jeremiah Folsom-Kovarik Soar Technology, Inc.
Kazuma Fuchimoto The University of Electro-Communications
Hagit Gabbay School of Education, Tel Aviv University
Kobi Gal The University of Edinburgh
Wenbin Gan National Institute of Information and Communications Technology
Mark Gierl University of Alberta
Aldo Gordillo Universidad Politécnica de Madrid (UPM)
Guher Gorgun University of Alberta
Art Graesser University of Memphis
Sabine Graf Athabasca University
Monique Grandbastien LORIA, Universite de Lorraine
Julio Guerra Universidad Austral de Chile
Ella Haig School of Computing, University of Portsmouth
Ching Nam Hang City University of Hong Kong
Jiangang Hao Educational Testing Service
Ellie Hajarian Athabasca University
Jason Harley McGill University
Erik Harpstead Carnegie Mellon University
Fatima Harrak Sorbonne Université - LIP6
Carl Haynes-Magyar Carnegie Mellon University
Surina He University of Alberta
Sami Heikkinen LAB University of Applied Sciences
Arto Hellas Aalto University
Erik Hemberg ALFA
Nicolas Hernandez Nantes Université - LS2N CNRS UMR 6004
Martin Hlosta The Swiss Distance University of Applied Sciences
Anett Hoppe TIB Leibniz Information Centre for Science and Technology; L3S Research Centre, Leibniz Universität Hannover
Lingyun Huang The Education University of Hong Kong
Yun Huang Austral University of Chile
Paul Hur University of Illinois at Urbana-Champaign
Sébastien Iksal LIUM - Le Mans Université, France
Vladimir Ivančević University of Novi Sad, Faculty of Technical Sciences
Hyeji Jang Ewha Womans University
Emily Jensen University of Colorado Boulder
Lan Jiang University of Illinois Urbana-Champaign
David Joyner Georgia Institute of Technology
Jina Kang University of Illinois Urbana-Champaign
Tanja Käser EPFL
Mohammad Khalil University of Bergen
Ekaterina Kochmar MBZUAI
Elizabeth Koh National Institute of Education, Nanyang Technological University, Singapore
Sotiris Kotsiantis University of Patras
Vitomir Kovanovic The University of South Australia
Milos Kravcik DFKI GmbH
Swathi Krishnaraja University of Potsdam
Roland Kuhn National Research Council of Canada
Amruth Kumar Ramapo College of New Jersey
Vive Kumar Athabasca University
Vishal Kuvar University of Minnesota
Hollis Lai University of Alberta
Sébastien Lallé Sorbonne University
Andrew Lan University of Massachusetts Amherst
Juan Alfonso Lara Torralbo University of Córdoba
Mikel Larrañaga University of the Basque Country
Elise Lavoué iaelyon, Université Jean Moulin Lyon 3, LIRIS
Tai Le Quy IU International University of Applied Sciences
Vwen Yen Alwyn Lee Nanyang Technological University
Marie Lefevre LIRIS - Université Lyon 1
Juho Leinonen Aalto University
Arun Balajiee Lekshmi Narayanan University of Pittsburgh
James Lester North Carolina State University
Chenglu Li University of Florida
Jiawei Li Nanyang Technological University
Warren Li University of Michigan
Jionghao Lin Carnegie Mellon University
Qi Liu University of Science and Technology of China
Zhexiong Liu University of Pittsburgh
Sonsoles López-Pernas University of Eastern Finland
Yu Lu Beijing Normal University
Vanda Luengo Sorbonne Université - LIP6
Ivan Luković University of Belgrade, Faculty of Organizational Sciences
Collin Lynch North Carolina State University
Nick Lytle Georgia Tech Univeristy
Boxuan Ma Kyshu University
Qiang Ma Kyoto Institute of Technology
Qianou Ma Carnegie Mellon University Human-Computer Interaction Institute
Jeffrey Matayoshi McGraw Hill ALEKS
Madeth May University of Maine
Gord McCalla University of Saskatchewan
Emma McDonald University of Alberta
Guilherme Medeiros Machado ECE Paris
Victor Menendez-Dominguez Universidad Autónoma de Yucatán
Donatella Merlini Università di Firenze
Caitlin Mills University of Minnesota
Tsunenori Mine Kyushu University
Tsubasa Minematsu Kyushu University
Sein Minn INRIA
Phaedra Mohammed The University of the West Indies
Luis Alberto Morales Rosales Conacyt-Universidad Michoacana de San Nicolás de Hidalgo
Matthew Moreno McGill University
Pedro Manuel Moreno-Marcos Universidad Carlos III de Madrid
Bradford Mott North Carolina State University
Kousuke Mouri Tokyo University of Agriculture and Technology
Calarina Muslimani University of Alberta
Tanya Nazaretsky EPFL
Huy Nguyen University of Pittsburgh
Narges Norouzi Berkeley
Ange Adrienne Nyamen Tato École de Technologie Supérieure
Teresa Ober Educational Testing Service
Püren Öncel University of Minnesota Twin Cities
Tounwendyam Frédéric Ouedraogo Université Norbert ZONGO
Maciej Pankiewicz Warsaw University of Life Sciences
Yeonjeong Park Honam University
Philip Pavlik University of Memphis
Jorge Poco Fundação Getulio Vargas
Paul Stefan Popescu University of Craiova
Oleksandra Poquet Technical University of Munich
Ethan Prihar École polytechnique fédérale de Lausanne
David Pritchard Massachusetts Institute of Technology
Miroslava Raspopović Faculty of Information Technology
Narjes Rohani University of Edinburgh
José Raúl Romero University of Cordoba
Daniela Rotelli University of Pisa
Mirka Saarela University of Jyväskylä
Maria Ofelia San Pedro Roblox
Sreecharan Sankaranarayanan Carnegie Mellon University
Mohammed Saqr University of Eastern Finland
Petra Sauer beuth university of applied sciences
Robin Matthias Schmucker Carnegie Mellon University
Filippo Sciarrone Universitas Mercatorum
Kazuhisa Seta Osaka Metropolitan University
Lele Sha Monash University
Lei Shi Newcastle University
Yang Shi Utah State University
Jinnie Shin University of Florida
Aditi Singh Cleveland State University
Daevesh Singh Indian Institute of Technology
Stefan Slater Teachers College
Juyeong Song Ewha Womans University
Frank Stinar University of Illinois - Urbana Champaign
Vinitra Swamy EPFL
Anaïs Tack KU Leuven
Ling Tan Australian Council for Educational Research
Michelle Taub University of Central Florida
Daniela Teodorescu LMU Munich
Craig Thompson The University of British Columbia
Emiko Tsutsumi The University of Electro-Communications
Maomi Ueno The University of Electro-Communications
Maya Usher Technion
Masaki Uto The University of Electro-Communications
Sowmya Vajjala National Research Council, Canada
José Antonio Hiram Vázquez-López Tecnológico Nacional de México, campus Instituto Tecnológico Superior de Misantla
Oswaldo Velez-Langs Universidad de Cordoba
Rémi Venant Le Mans Université - LIUM
Olga Viberg KTH Royal Institute of Technology
Markel Vigo The University of Manchester
Alessandro Vivas UFVJM
Tuyet-Trinh Vu SOICT-HUST
Deliang Wang The University of Hong Kong
Zichao Wang Rice University
Christabel Wayllace New Mexico State University
Daniel Weitekamp Carnegie Mellon University
Jacob Whitehill Worcester Polytechnic Institute
Alistair Willis The Open University
Aaron Wong University of Minnesota
Chris Wong University of Technology Sydney
Jacqueline Wong Utrecht University
Beverly Park Woolf University of Massachusetts
Yi-jung Wu University of Wisconsin-Madison
Peter Wulff Heidelberg University of Education
Jia Xu Guangxi University
Elad Yacobson Weizmann Institue of Science
Seyma Yildirim-Erbasli Concordia University of Edmonton
Chengjiu Yin Kyushu University
Andrew Zamecnik University of South Australia
Diego Zapata-Rivera Educational Testing Service
Jiayi Zhang University of Pennsylvania
Lisa Zhang University of Toronto
Wenbin Zhang Florida International University
Yingbin Zhang South China Normal University
Lanqin Zheng Beijing Normal University
Stefano Zingaro Università di Bologna

Sponsors

Bronze Tier


image image

Keynotes

Generalizable and Interpretable Models of Learning

Tanja Käser, EPFL School of Computer and Communication Sciences

Modeling learners’ knowledge, behavior, and strategies is at the heart of educational technology. Learner models serve as a basis for adapting the learning experience to students’ needs and supporting teachers in classroom orchestration. Consequently, a large body of research has focused on creating accurate models of student knowledge and behaviors. However, current modeling approaches are still limited: they are either defined for specific and well-structured domains (e.g., algebra, vocabulary learning) requiring substantial work from experts and limiting generalizability, or they lack interpretability. Recent advances in generative AI, in particular large language models (LLMs), have the potential to address these constraints. However, LLMs lack alignment with educational goals and a grounded knowledge.

In this talk, I will discuss the key challenges in developing generalizable and explainable models, and our solutions to address them, including models tracking learning in open-ended environments and generalizing between different environments and populations. I will present our work on explainable AI, including a rigorous evaluation of existing approaches, the development of inherently interpretable models, as well as studies on effectively communicating model explanations. Finally, I will show some of our recent results combining “traditional” modeling approaches and LLMs to provide interpretable feedback and explanations while not compromising on model trustworthiness.

Opportunities and Challenges for LLM Agent-Based Support for Collaborative Design

Carolyn Rosé, Professor of Language Technologies and Human-Computer Interaction, School of Computer Science, Carnegie Mellon, USA

Supporting collaborative design is an ideal context for exploring the capabilities and limitations of LLM-based conversational agents. The ability to extract information in context and produce a coherent sounding text can be used to generate reflection triggers. In two recent studies, we have employed LLM-based conversational agents with the goal of triggering human reflection and learning during collaborative software design. As humans engage in collaborative design, they employ their own abilities to reason abstractly, to decompose problems, and apply principles productively. Reflection is a valuable activity for promoting human learning in these settings. However, what humans are able to do in terms of abstraction and reasoning as part of their creative problem solving is precisely what is most difficult for LLM agents to do. In contrast to claims of “super-human performance” in the media, in this talk we will explore the complementarity of human intelligence and Artificial Intelligence. We will begin with results of a classroom study where LLM-based conversational agent support for collaborative software development was successful in increasing student learning. From there we will move on to argue in favor of a research agenda for exploiting the complementarity both in terms of applying AI capabilities to the betterment of human learning as well as inspiring further extension of technical capabilities from insights derived from observation of human reflection and learning in collaborative design.

Test of Time Award Talk

Assessing Student Learning in Open Ended Learning Environments From Sequential to Multimodal Data Analysis

Gautam Biswas, Professor of Computer Science, Vanderbilt University, Nashville, TN. USA

From my early days as an AIED and EDM researcher, I have focused on understanding how students learn, especially in scenarios where they have to construct and apply their knowledge to problem-solving tasks. Collaborating with peers, we developed open-ended learning environments (OELEs) where K-12 students build scientific models and apply them to solve real-world problems. Challenges arise, as students have to navigate with multiple tools in the computer-environments. Some students overcome these challenges to become effective learners while others struggle to progress often applying suboptimal learning strategies. John Kinnebrew and I began analyzing learners’ activity logs to study these differences, resulting in the Differential Sequence Mining algorithm, which earned us the best paper award at EDM 2012. Expanding on this work, we developed the Contextualized Difference Mining method for understanding students’ learning behaviors, for which we are receiving this Test of Time award.

In my talk, I will review our work on Differential sequence mining, and explore its applications in understanding students’ cognitive and metacognitive learning behaviors. We have leveraged these insights to provide adaptive support, helping students’ progress in our Open-Ended Learning Environments (OELEs). Beyond this, we have employed other sequential representations, such as Markov Chains and Hidden Markov Models, to analyze students’ activities and behaviors in the context of their learning and problem-solving tasks. Beyond this, our OELEs have advanced to facilitate students’ integrated learning of science, computing, and engineering problem-solving, including collaborative efforts in computer-based and embodied learning scenarios. From these richer learning environments, I will share insights into our latest efforts involving multimodal data analysis, incorporating video, speech, and activity logs. Using vision-based deep learning models and large language models (LLMs), we integrate analyses across modalities, offering a comprehensive understanding of students’ collaborative learning and problem-solving activities. In conclusion, I will discuss the potential implications of our work on shaping future of learning in classrooms.

EDM data set award

Jakub Kužílek

Senior Researcher, Computer Science Education / Computer Science and Society research group, Humboldt-Universität zu Berlin & German Research Center for Artificial Intelligence, Berlin, Germany.

Bio. Jakub Kužílek is affiliated with the Computer Science Education / Computer Science and Society research group at Humboldt-Universität zu Berlin and the Educational Technology Lab at the German Research Center for Artificial Intelligence (DFKI) as a senior researcher. His research investigates student self-regulated learning within online learning environments, collaborative group work, adaptive assessments, and feedback within the context of digital education. In the past, he developed (together with Martin Hlosta and Zdenek Zdrahal) an OU Analyse system used to support 200.000 students of the Open University (United Kingdom) during their studies and founded learning analytics research at Czech Technical University. He has led (and currently is doing research within) a project on AI use in assessment feedback at Humbold-Universität. In parallel, he is leading the project on AI-driven recommendation systems in vocational education (KIPerWeb at DFKI).

Martin Hlosta

Anand Deshpande, Founder and Chairman, Persistent Systems, IN

Bio. Martin Hlosta is a Senior Researcher at the Institute for Distance Learning and eLearning Research (IFeL) at Swiss distance university of applied sciences (FFHS). Before joining FFHS, he led research and development of OUAnalyse at The Open University (OU) – a Predictive Learning Analytics project deployed in all undergraduate courses, improving student retention and teachers practice. It is one of the world-largest deployment of analytics systems in education and in 2020 it was selected by UNESCO as one of the four best projects using AI in education. His following work focused on identifying factors contributing to large gaps of disadvantaged students in the UK, and in another study presented how using predictive analytics by teachers in an online course can lower these gaps for students coming from low Socio-Economic areas. Currently, he is leading research and teaching in Learning Analytics at FFHS and works on various strands how learning analytics can improve feedback. His most recent project funded by Unity and Meta to target inequalities in education explores how immersive Virtual Reality and enhanced analytics for reflection can help future teachers in South Africa.

JEDM Presentations

The Knowledge Component Attribution Problem for Programming: Methods and Tradeoffs with Limited Labeled Data

Yang Shi NC State University yshi26@ncsu.edu
Robin Schmucker Carnegie Mellon University rschmuck@cs.cmu.edu
Keith Tran NC State University ktran24@ncsu.edu
John Bacher NC State University jtbacher@ncsu.edu
Kenneth Koedinger Carnegie Mellon University koedinger@cmu.edu
Thomas Price NC State University twprice@ncsu.edu
Min Chi NC State University mchi@ncsu.edu
Tiffany Barnes NC State University tmbarnes@ncsu.edu

Understanding students’ learning of knowledge components (KCs) is an important educational data mining task and enables many educational applications. However, in the domain of computing education, where program exercises require students to practice many KCs simultaneously, it is a challenge to attribute their errors to specific KCs and, therefore, to model student knowledge of these KCs. In this paper, we define this task as the KC attribution problem. We first demonstrate a novel approach to addressing this task using deep neural networks and explore its performance in identifying expert-defined KCs (RQ1). Because the labeling process takes costly expert resources, we further evaluate the effectiveness of transfer learning for KC attribution, using more easily acquired labels, such as problem correctness (RQ2). Finally, because prior research indicates the incorporation of educational theory in deep learning models could potentially enhance model performance, we investigated how to incorporate learning curves in the model design and evaluated their performance (RQ3). Our results show that in a supervised learning scenario, we can use a deep learning model, code2vec, to attribute KCs with a relatively high performance (AUC > 75% in two of the three examined KCs). Further using transfer learning, we achieve reasonable performance on the task without any costly expert labeling. However, the incorporation of learning curves shows limited effectiveness in this task. Our research lays important groundwork for personalized feedback for students based on which KCs they applied correctly, as well as more interpretable and accurate student models.

Automated Evaluation of Classroom Instructional Support with LLMs and BoWs: Connecting Global Predictions to Specific Feedback

Jacob Whitehill Worcester Polytechnic Institute jrwhitehill@wpi.edu
Jennifer LoCasale-Crouch Virginia Commonwealth University locasalecrj@vcu.edu

With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, we explore how Large Language Models (LLMs) can be used to estimate “Instructional Support” domain scores of the CLassroom Assessment Scoring System (CLASS), a widely used observation protocol. We design a machine learning architecture that uses either zero-shot prompting of Meta’s Llama2, and/or a classic Bag of Words (BoW) model, to classify individual utterances of teachers’ speech (transcribed automatically using OpenAI’s Whisper) for the presence of Instructional Support. Then, these utterance-level judgments are aggregated over a 15-min observation session to estimate a global CLASS score. Experiments on two CLASS-coded datasets of toddler and pre-kindergarten classrooms indicate that (1) automatic CLASS Instructional Support estimation accuracy using the proposed method (Pearson R up to 0.48) approaches human inter-rater reliability (up to R = 0.55); (2) LLMs generally yield slightly greater accuracy than BoW for this task, though the best models often combined features extracted from both LLM and BoW; and (3) for classifying individual utterances, there is still room for improvement of automated methods compared to human-level judgments. Finally, (4) we illustrate how the model’s outputs can be visualized at the utterance level to provide teachers with explainable feedback on which utterances were most positively or negatively correlated with specific CLASS dimensions.

An Approach to Improve k-Anonymization Practices in Educational Data Mining

Frank Stinar University of Illinois Urbana–Champaign fstinar2@illinois.edu
Zihan Xiong University of Pennsylvania zihanx3@seas.upenn.edu
Nigel Bosch University of Illinois Urbana–Champaign pnb@illinois.edu

With the aim to provide teachers with more specific, frequent, and actionable feedback about their teaching, Educational data mining has allowed for large improvements in educational outcomes and understanding of educational processes. However, there remains a constant tension between educational data mining advances and protecting student privacy while using educational datasets. Publicly available datasets have facilitated numerous research projects while striving to preserve student privacy via strict anonymization protocols (e.g., k-anonymity); however, little is known about the relationship between anonymization and utility of educational datasets for downstream educational data mining tasks, nor how anonymization processes might be improved for such tasks. We provide a framework for strictly anonymizing educational datasets with a focus on improving downstream performance in common tasks such as student outcome prediction. We evaluate our anonymization framework on five diverse educational datasets with machine learning-based downstream task examples to demonstrate both the effect of anonymization and our means to improve it. Our method improves downstream machine learning accuracy versus baseline data anonymization by 30.59%, on average, by guiding the anonymization process toward strategies that anonymize the least important information while leaving the most valuable information intact.

Exploring the Impact of Symbol Spacing and Problem Sequencing on Arithmetic Performance: An Educational Data Mining Approach

Avery Harrison Closser Purdue University aclosser@purdue.edu
Anthony F. Botelho University of Florida abotelho@coe.ufl.edu
Jenny Yun-Chen Chan The Education University of Hong Kong chanjyc@eduhk.hk

Experimental research on perception and cognition has shown that inherent and manipulated visual features of mathematics problems impact individuals’ problem-solving behavior and performance. In a recent study, we manipulated the spacing between symbols in arithmetic expressions to examine its effect on 174 undergraduate students’ arithmetic performance but found results that were contradictory to most of the literature (Closser et al., 2023). Here, we applied educational data mining (EDM) methods to that dataset at the problem level to investigate whether inherent features of the 32 experimental problems (i.e., problem composition, problem order) may have caused unintended effects on students’ performance. We found that students were consistently faster to correctly simplify expressions with the higher-order operator on the left, rather than right, side of the expression. Furthermore, average response times varied based on the symbol spacing of the current and preceding problem, suggesting that problem sequencing matters. However, including or excluding problem identifiers in analyses changed the interpretation of results, suggesting that the effect of sequencing may be impacted by other, undefined problem-level factors. These results advance cognitive theories on perceptual learning and provide implications for educational researchers: online experiments designed to investigate students’ performance on mathematics problems should include a variety of problems, systematically examine the effects of problem order, and consider applying different data analysis approaches to detect effects of inherent problem features. Moreover, EDM methods can be a tool to identify nuanced effects on behavior and performance in the context of data from online platforms.

Effect of Gamification on Gamers: Evaluating Interventions for Students Who Game the System

Kirk Vanacore Worcester Polytechnic Institute kpvanacore@wpi.edu
Ashish Gurung Carnegie Mellon University agurung@andrew.cmu.edu
Adam Sales Worcester Polytechnic Institute asales@wpi.edu
Neil Heffernan Worcester Polytechnic Institute nth@wpi.edu

Gaming the system is a persistent problem in Computer-Based Learning Platforms. While substantial progress has been made in identifying and understanding such behaviors, effective interventions remain scarce. This study explores the impact of two types of interventions – gamification and manipulation of assistance access – on the learning outcomes of students who tend to game the system using a method of causal moderation known as Fully Latent Principal Stratification. The results indicate that gamification does not consistently mitigate these negative behaviors. One gamified condition had a consistently positive effect on learning regardless of students’ propensity to game the system, whereas the other had a negative effect on such students. However, delaying access to hints and feedback may have a positive effect on the learning outcomes of those gaming the system. This paper also illustrates the potential of integrating detection and causal methodologies within education data mining for understanding how to respond to behaviors effectively after they are detected.

LearnSphere: A Learning Data and Analytics Cyberinfrastructure

John Stamper Carnegie Mellon University jstamper@cmu.edu
Philip I. Pavlik Jr. University of Memphis ppavlik@memphis.edu
Steven Moore Carnegie Mellon University stevenmo@andrew.cmu.edu
Kenneth Koedinger Carnegie Mellon University koedinger@cmu.edu

LearnSphere is a web-based data infrastructure designed to transform scientific discovery and innovation in education. It supports learning researchers in addressing a broad range of issues including cognitive, social, and motivational factors in learning, educational content analysis, and educational technology innovation. LearnSphere integrates previously separate educational data and analytic resources developed by participating institutions. The web-based workflow authoring tool, Tigris, allows technical users to contribute sophisticated analytic methods, and learning researchers can adapt and apply those methods using graphical user interfaces, importantly, without additional programming. As part of our use-driven design of LearnSphere, we built a community through workshops and summer schools on educational data mining. Researchers interested in particular student levels or content domains can find student data from elementary through higher-education and across a wide variety of course content such as math, science, computing, and language learning. LearnSphere has facilitated many discoveries about learning, including the importance of active over passive learning activities and the positive association of quality discussion board posts with learning outcomes. LearnSphere also supports research reproducibility, replicability, traceability, and transparency as researchers can share their data and analytic methods along with links to research papers. We demonstrate the capabilities of LearnSphere through a series of case studies that illustrate how analytic components can be combined into research workflow combinations that can be developed and shared. We also show how open web-accessible analytics drive the creation of common formats to streamline repeated analytics and facilitate wider and more flexible dissemination of analytic tool kits.

Session-based Methods for Course Recommendation

Md Akib Zabed Khan Florida International University mkhan149@fiu.edu
Agoritsa Polyzou Florida International University apolyzou@fiu.edu

In higher education, academic advising is crucial to students’ decision-making. Data-driven models can benefit students in making informed decisions by providing insightful recommendations for completing their degrees. To suggest courses for the upcoming semester, various course recommendation models have been proposed in the literature using different data mining techniques and machine learning algorithms utilizing different data types. One important aspect of the data is that usually, courses taken together in a semester fit well with each other. If there is no correlation between the co-taken courses, students may find it more difficult to handle the workload. Based on this insight, we propose using session-based approaches to recommend a set of well-suited courses for the upcoming semester. We test three session-based course recommendation models, two based on neural networks (CourseBEACON and CourseDREAM) and one on tensor factorization (TF-CoC). Additionally, we propose a postprocessing approach to adjust the recommendation scores of any base course recommender to promote related courses. Using metrics capturing different aspects of the recommendation quality, our experimental evaluation shows that session-based methods outperform existing popularity-based, association-based, similarity-based, factorization-based, neural networks-based, and Markov chain-based recommendation approaches. Effective course recommendations can result in improved student advising, which, in turn, can improve student performance, decrease dropout rates, and a more positive overall student experience and satisfaction.

Best Paper AIED 2023 Presentation

Confusion, Conflict, Consensus: Modeling Dialogue Processes During Collaborative Learning with Hidden Markov Models

Toni V. Earle-Randell University of Florida
Joseph B. Wiggins University of Florida
Julianna Martinez Ruiz University of Florida
Mehmet Celepkolu University of Florida
Kristy Elizabeth Boyer University of Florida
Collin F. Lynch North Carolina State University
Maya Israel University of Florida
Eric Wiebe North Carolina State University

There is growing recognition that AI technologies can, and should, support collaborative learning. To provide this support, we need models of collaborative talk that reflect the ways in which learners interact. Great progress has been made in modeling dialogue for high school and college-age learners, but the dialogue processes that characterize collaborative talk between elementary learner dyads are not currently well understood. This paper reports on a study with elementary school learners (4th and 5th grade, ages 9–11 years old) coded collaboratively in dyads. We recorded dialogue from 22 elementary school learner dyads, covering 7594 total utterances. We labeled this corpus manually with dialogue acts and then induced a hidden Markov model to identify the underlying dialogue states and the transitions between these states. The model identified six distinct hidden states which we interpret as Social Dialogue, Confusion, Frustrated Coordination, Exploratory Talk, Directive & Disagreement, and Disagreement & Self-Explanation. The HMM revealed that when students entered into a productive exploratory talk state, the primary way they transitioned out of this state is when they became confused or reached an impasse. When this occurred, the learners then moved into states of disputation and conflict before re-entering the Exploratory Talk state. These findings can inform the design of AI agents who support young learners’ collaborative talk and help agents determine when students are conflicting rather than collaborating.