Accepted Papers: Educational Data Mining 2024- New tools, new prospects, new risks – educational data mining in the age of generative AI

Long Papers

DISTO: Evaluating Textual Distractors for Multiple Choice Questions using a Negative Sampling based Approach – Bilal Ghanem and Alona Fyshe
Parametric Constraints for Bayesian Knowledge Tracing from First Principles – Denis Shchepakin, Sreecharan Sankaranarayanan and Dawn Zimmaro
Predicting GRE Scores from Application Materials in Test-Optional Admissions – Yijun Zhao, Zhengxin Qi, Son Tung Do, John Grossi, Jee Hun Kang and Gary Weiss
Grading and Clustering Student Programs That Produce Probabilistic Output – Yunsung Kim, Jadon Geathers and Chris Piech
Reexamining Learning Curve Analysis in Programming Education: The Value of Many Small Problems – Mehmet Arif Demirtas, Max Fowler and Kathryn Cunningham
Evaluating and Optimizing Educational Content with Large Language Model Judgments – Joy He-Yueya, Noah Goodman and Emma Brunskill
Assessing the Promise and Pitfalls of ChatGPT for Automated CS1-driven Code Generation – Muhammad Fawad Akbar Khan, Max Ramsdell, Erik Falor and Hamid Karimi
On the Selection of Positive and Negative Samples for Contrastive Math Word Problem Neural Solver – Yiyao Li, Lu Wang, Jung Jae Kim, Chor Seng Tan and Ye Luo
GPT vs. Llama2: Which Comes Closer to Human Writing? – Fernando Martinez, Gary Weiss, Miguel Palma, Haoran Xue, Alexander Borelli and Yijun Zhao
Combining Dialog Acts and Skill Modeling: What Chat Interactions Enhance Learning Rates During AI-Supported Peer Tutoring? – Conrad Borchers, Kexin Yang, Jionghao Lin, Nikol Rummel, Kenneth R. Koedinger and Vincent Aleven
A Generalized Apprenticeship Learning Framework for Modeling Heterogeneous Student Pedagogical Strategies – Md Mirajul Islam, Xi Yang, John Hostetter, Adittya Soukarjya Saha and Min Chi
When Chatting Isn’t Cheating: Mining and Evaluating Student Use of Chatbots and Other Resources During Open-Internet Exams – David Joyner, Zoey Anne Beda, Michael Cohen, Melanie Duffin, Amy Garcia Fernandez, Liz Hayes-Golding, Jonathan Hildreth, Alex Houk, Rebecca Johnson, Kayla Matchek and Ana Santos
Using Large Language Models to Detect Self-Regulated Learning in Think-Aloud Protocols – Jiayi Zhang, Conrad Borchers, Vincent Aleven and Ryan S. Baker
Propositional Extraction from Natural Speech in Small Group Collaborative Tasks – Videep Venkatesha, Abhijnan Nath, Ibrahim Khebour, Avyakta Chelle, Mariah Bradford, Jingxuan Tu, James Pustejovsky, Nathaniel Blanchard and Nikhil Krishnaswamy
Towards Generalizable Agents in Text-Based Educational Environments: A Study of Integrating RL with LLMs – Bahar Radmehr, Adish Singla and Tanja Käser
Investigating Student Ratings with Features of Automatically Generated Questions: A Large-Scale Analysis using Data from Natural Learning Contexts – Benny Johnson, Jeff Dittel and Rachel Van Campenhout
Beyond Accuracy: Embracing Meaningful Parameters in Educational Data Mining – Napol Rachatasumrit, Paulo Carvalho and Kenneth Koedinger
Says Who? How different ground truth measures of emotion impact student affective modeling – Andres Felipe Zambrano, Nidhi Nasiar, Jaclyn Ocumpaugh, Alex Goslen, Jiayi Zhang, Jonathan Rowe, Jordan Esiason, Jessica Vandenberg and Stephen Hutt
Multimodal Learning Analytics for Predicting Student Collaboration Satisfaction in Collaborative Game-Based Learning – Halim Acosta, Seung Lee, Bradford Mott, Haesol Bae, Krista Glazewski, Cindy Hmelo-Silver and James Lester
How Can I Improve? Using GPT to Highlight the Desired and Undesired Parts of Open-ended Responses – Jionghao Lin, Eason Chen, Feifei Han, Ashish Gurung, Danielle R Thomas, Wei Tan, Ngoc Dang Nguyen and Kenneth Koedinger
How Much Training is Needed? Reducing Training Time using Deep Reinforcement Learning in an Intelligent Tutor – Nazia Alam, Behrooz Mostafavi, Sutapa Dey Tithi, Min Chi and Tiffany Barnes

Industry & Short Papers

An Evaluation of a Placement Assessment for an Adaptive Learning System – Jeffrey Matayoshi, Eric Cosyn, Christopher Lechuga and Hasan Uzun
Non-Overlapping Leave Future Out Validation (NOLFO): Implications for Graduation Prediction – Lief Esbenshade, Jonathan Vitale and Ryan Baker
Examining the Algorithmic Fairness in Predicting High School Dropouts – Chenguang Pan and Zhou Zhang
Phone Use While Programming – Kaden Hart, Christopher Warren, Seth Poulsen and John Edwards
Open Science and Educational Data Mining: Which Practices Matter Most? – Ryan S. Baker, Stephen Hutt, Christopher A. Brooks, Namrata Srivastava and Caitlin Mills
Evaluating Multi-Knowledge Component Interpretability of Deep Knowledge Tracing Models in Programming – Yang Shi, Min Chi, Tiffany Barnes and Thomas Price
Promoting Theory-Building in Design-Based Research through Data-Based Models – Golnaz Arastoopour Irgens, Ibrahim Adisa, Deepika Sistla, Tolulope Famaye, Cinamon Bailey, Atefeh Behboudi and Adenike Adefisayo
More, May not the Better: Insights from Applying Deep Reinforcement Learning for Pedagogical Policy Induction – Gyuhun Jung, Markel Sanz Ausin, Tiffany Barnes and Min Chi
Retrieval-augmented Generation to Improve Math Question-Answering: Trade-offs Between Groundedness and Human Preference – Owen Henkel, Zach Levonian, Chenglu Li and Millie Postle
Navigating the Data-Rich Landscape of Online Learning: Insights and Predictions from ASSISTments – Aswani Yaramala, Soheila Farokhi and Hamid Karimi
SingPAD: A Knowledge Tracing Dataset Based on Music Performance Assessment – Ying Zhang, Yan Zhang, Wei Xu, Zhifeng Wang and Jianwen Sun
Large Language Models for In-Context Student Modeling: Synthesizing Student’s Behavior in Visual Programming – Manh Hung Nguyen, Sebastian Tschiatschek and Adish Singla
Early Prediction of Student Dropout in Higher Education using Machine Learning Models – Or Goren, Liron Cohen and Amir Rubinstein
Speaker Diarization in the Classroom: How Much Does Each Student Speak in Group Discussions? – Jiani Wang, Shiran Dudy, Xinlu He, Zhiyong Wang, Rosy Southwell and Jacob Whitehill
A page jump recommendation model and result interpretation based on structured annotation methods – Wenhao Wang, Etsuko Kumamoto and Chengjiu Yin
LOOL: Towards Personalization with Flexible & Robust Estimation of Heterogeneous Treatment Effects – Duy Pham, Kirk Vanacore, Adam Sales and Johann Gagnon-Bartsch
Problem-Solving Types and EdTech Effectiveness: A Model for Exploratory Causal Analysis – Adam Sales, Kirk Vanacore, Hyeon-Ah Kang and Tiffany Whittaker
Investigating Student Interest in a Minecraft Game-Based Learning Environment: A Changepoint Detection Analysis – Yiqiu Zhou and Luc Paquette
Feeling the Difficulty of Mathematics – Bledar Fazlija
Generative AI for Peer Assessment Helpfulness Evaluation – Chengyuan Liu, Jialin Cui, Ruixuan Shang, Qinjin Jia, Parvez Rashid and Edward Gehringer
Replicating an “Astonishing Regularity in Student Learning Rates” – Mary Ann Simpson, Kole Norberg and Stephen Fancsali
Are You an Early Dropper or Late Shopper? Mining Enrollment Transaction Data to Study Procrastination in Higher Education – Conrad Borchers, Yinuo Xu and Zachary A. Pardos
E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems – Yuma Miyazaki, Valdemar Švábenský, Yuta Taniguchi, Fumiya Okubo, Tsubasa Minematsu and Atsushi Shimada
Investigation of Behavioural Differences: Uncovering Behavioral Sources of Demographic Bias in Educational Algorithms – Jade Maï Cock, Hugues Saltini, Haoyu Sheng, Riya Ranjan, Richard Davis and Tanja Käser
Principals’ use of data analytics in Finnish schools – Ayaz Karimov, Mirka Saarela, Tommi Kärkkäinen and Sabina Aghayeva
Automatic Matchmaking in two-versus-two sports – Sören Rüttgers, Ulrike Kuhl and Benjamin Paaßen
Power Calculations for Randomized Controlled Trials with Auxiliary Observational Data – Jaylin Lowe, Charlotte Mann, Jiaying Wang, Adam Sales and Johann Gagnon-Bartsch
Plagiarism Detection Using Keystroke Logs – Scott Crossley, Yu Tian, Joon Suh Choi, Langdon Holmes and Wesley Morris
Who Should I Help Next? Simulation of Office Hours Queue Scheduling Strategy in a CS2 Course – Zhikai Gao, Gabriel Silva de Oliveira, Damilola Babalola, Collin Lynch and Sarah Heckman
On Assessing the Faithfulness of LLM-generated Feedback on Student Assignments – Qinjin Jia, Jialin Cui, Ruijie Xi, Chengyuan Liu, Parvez Rashid, Ruochi Li and Edward Gehringer
Analyzing Large Language Models for Classroom Discussion Assessment – Nhat Tran, Richard Correnti, Lindsay Clare Matsumura, Benjamin Pierce and Diane Litman
Investigating the relations between students’ affective states and the coherence in their activities in Open-Ended Learning Environments – Celestine Akpanoko, Ashwin T S, Grayson Cordell and Gautam Biswas
Using Publicly Available Auxiliary Data to Improve Precision of Treatment Effect Estimation in a Randomized Efficacy Trial – Charlotte Mann, Jiaying Wang, Adam Sales and Johann Gagnon-Bartsch
Integrating Attentional Factors and Spacing in Logistic Knowledge Tracing Models to Explore the Impact of Train-ing Sequences on Category Learning – Meng Cao, Philip Pavlik Jr., Wei Chu and Liang Zhang
Math in Motion: Analyzing Real-Time Student Collaboration in Computer-Supported Learning Environments – Hongming Li, Shan Zhang, Seiyon Lee, Ji-Eun Lee, Zirui Zhong, Erik Weitnauer and Anthony F. Botelho
This Paper Was Written with the Help of ChatGPT: Exploring the Consequences of AI-Driven Academic Writing on Scholarly Practices – Hongming Li, Seiyon Lee and Anthony F. Botelho
Enhancing Multimodal Learning Analytics: A Comparative Study of Facial Feature Capture Using Traditional vs 360-Degree Cameras in Collaborative Learning – Robin Jephthah Rajarathinam, Christian Palaguachi and Jina Kang
De-Identifying Student Personally Identifying Information with GPT-4 – Shreya Singhal, Andres Felipe Zambrano, Maciej Pankiewicz, Xiner Liu, Chelsea Porter and Ryan S. Baker
From Reaction to Anticipation: Predicting Future Affect – Andres Felipe Zambrano, Ryan S. Baker, Sami Baral, Neil Heffernan and Andrew Lan
What metrics of participation balance predict outcomes of collaborative learning with a robot? – Yuya Asano, Diane Litman, Quentin King-Shepard, Tristan Maidment, Tyree Langley, Teresa Davison, Timothy Nokes-Malach, Adriana Kovashka and Erin Walker
Building Learner Activity Models From Log Data Using Sequence Mapping and Hidden Markov Models – Paras Sharma, Angela E.B. Stewart, Qichang Li, Krit Ravichander and Erin Walker
Multimodal, Multi-Class Bias Mitigation for Predicting Speaker Confidence – Andrew Emerson, Arti Ramesh, Patrick Houghton, Vinay Basheerabad, Navaneeth Jawahar and Chee Wee Leong
Empowering Predictions of the Social Determinants of Mental Health through Large Language Model Augmentation in Students’ Lived Experiential Essays – Mohammad Arif Ul Alam, Madhavi Pagare, Susan Davis, Geeta Verma, Ashis Biswas and Justin Barber

Posters & Demos

Mining Epistemic Actions of Programming Problem Solving with Chat-GPT – Rwitajit Majumdar, Prajish Prasad and Aamod Sane
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning – Elena Grazia Gado, Tommaso Martorella, Luca Zunino, Paola Mejia-Domenzain, Vinitra Swamy, Jibril Frej and Tanja Käser
Connecting Blinks to Constructs: How are We Arguing for Validity in Multimodal Learning Analytics? – Gahyun Sung and Hanall Sung
The Construction and Analysis of Course Grades Across Public Universities – Hyun Jeong, Gary M. Weiss, Audrey Leung and Daniel D. Leeds
Examining the Influence of Varied Levels of Domain Knowledge Base Inclusion in GPT-based Intelligent Tutors – Blake Castleman and Mehmet Kerem Turkcan
Making Course Recommendation Explainable: A Knowledge Entity-Aware Model using Deep Learning – Tianyuan Yang, Baofeng Ren, Boxuan Ma, Md Akib Zabed Khan, Tianjia He and Shin’Ichi Konomi
How Ready Are Generative Pre-trained Large Language Models for Explaining Bengali Grammatical Errors? – Subhankar Maity, Aniket Deroy and Sudeshna Sarkar
Uncovering the Evolution of Topics about AI Painting: Dynamic Topic Modeling of 180k Discourse Data in an Online Community – Shiyao Wei and Ran Bi
Tracking Classroom Movement Patterns with Person Re-Id – Xinlu He, Jiani Wang, Viet Anh Trinh, Andrew McReynolds and Jacob Whitehill
Fair Prediction of Students’ Summative Performance Changes Using Online Learning Behavior Data – Zifeng Liu, Xinyue Jiao, Chenglu Li and Wanli Xing
AUTOMATED SCORING OF LEARNERS’ ANNOTATIONS OF MULTIPLE DIGITAL TEXTS – Alexandra List
Examining LLM Prompting Strategies for Automatic Evaluation of Learner-Created Computational Artifacts – Xiaoyi Tian, Amogh Mannekote, Carly E. Solomon, Yukyeong Song, Christine Fry Wise, Tom Mcklin, Joanne Barrett, Kristy Elizabeth Boyer and Maya Israel
Navigating the Sky Together: Investigating Collaboration Dynamics through Annotation in an Immersive Learning Environment – Yiqiu Zhou, Philo Wang and Jina Kang
Determining Perceived Text Complexity: An Evaluation of German Sentences Through Student Assessments – Boris Thome, Friederike Hertweck and Stefan Conrad
Strategic Interface Design Can Improve Learning Efficiency in an Intelligent Tutoring System – Sutapa Dey Tithi, Behrooz Mostafavi, Arun Kumar Ramesh and Tiffany Barnes
Relation of Linguistic Indicators to Civic Engagement in Special Education – Chak Li, Scott Crossley, Meghan Burke and Zach Rossetti
Automated Assessment in Math Education: A Comparative Analysis of LLMs for Open-Ended Responses – Sami Baral, Eamon Worden, Wen-Chiang Lim, Zhuang Luo, Christopher Santorelli, Ashish Gurung and Neil Heffernan
Social Network and Self-representation in Megathread: Group Formation in a Data Science Crowdsourcing Community – Shiyao Wei and Ran Bi
Evaluating Algorithmic Bias in Models for Predicting Academic Performance of Filipino Students – Valdemar Švábenský, Mélina Verger, Maria Mercedes T. Rodrigo, Clarence James G. Monterozo, Ryan S. Baker, Miguel Zenon Nicanor Lerias Saavedra, Sébastien Lallé and Atsushi Shimada
Prioritizing the Indicators of Effective Inclusive Education Assessment Framework using TOPSIS Analysis for children with Disabilities: A Case of Delhi. – Umesh Kumar and Haimanti Banerji
Towards Modeling Learner Performance with Large Language Models – Seyed Parsa Neshaei, Richard Davis, Adam Hazimeh, Bojan Lazarevski, Pierre Dillenbourg and Tanja Käser
Can Large Language Models Replicate ITS Feedback on Open-Ended Math Questions? – Hunter McNichols, Jaewook Lee, Stephen Fancsali, Steve Ritter and Andrew Lan
Be back in 5 minutes: Exploring correlations between short breaks with student performance – Yu-Chia Kao and Anthony Botelho
Predicting Cognitive Load Using Sensor Data in a Literacy Game – Minghao Cai and Carrie Demmans Epp
The Cleaned Repository of Annotated Personally Identifiable Information – Langdon Holmes, Scott Crossley, Jiahe Wang and Weixuan Zhang
Semantic Similarity of Teacher and Student Discourse Linked to Quality Ratings from Classroom Observations – Jessica Boyle and Scott Crossley
How Hard can this Question be? An Exploratory Analysis of Features Assessing Question Difficulty using LLMs – Andreea Dutulescu, Stefan Ruseti, Mihai Dascalu and Danielle Mcnamara
It’s All About the Prompt: Deductive Coding’s Role in AI vs. Human Performance – Jeanne McClure, Daria Smyslova, Amanda Hall and Shiyan Jiang
The Early Bird Gets the Grade: Student Use of Class Time for Ed-Tech Practice Predicts Learning – Ashish Gurung, Jionghao Lin, Zhongtian Huang, Ryan S. Baker, Vincent Aleven and Kenneth Koedinger
Same Learning Platform, Different Types of Research: A National-Level Analysis – Nidhi Nasiar, Ryan S. Baker, J. M. Alexandra Andres and Namrata Srivastava
Cultural Diversity in Team Conversations: A Deep Dive into its Effects on Cohesion and Team Performance – Mohammad Amin Samadi and Nia Nixon
An Exploratory Analysis of Students’ Problem-Solving Strategies in the Water Cycle Game – Jing Zhang and Luc Paquette
Prompting as Panacea? A Case Study of In-Context Learning Performance for Qualitative Coding of Classroom Dialog – Ananya Ganesh, Chelsea Chandler, Sidney D’Mello, Martha Palmer and Katharina Kann
Identifying Off-Task Users in a Large-Scale, Game-Based Practice Assessment – Matthew Emery, David Laing, Philip Simmons, Jacob Seybert, Katrina Yu, Erica Snow and Jack Buckley
EduQuest: Lecture Texts and Questions for Higher Education – Oliver Holl, Filipe Szolnoky Cunha, David Streuli and Timothé Laborie
Easing the Prediction of Student Dropout for everyone by integrating AutoML and Explainable Artificial Intelligence – Pamela Buñay-Guisñan, Juan Alfonso Lara, Alberto Cano, Rebeca Cerezo and Cristóbal Romero
LLM-generated Feedback in Real Classes and Beyond: Perspectives from Students and Instructors – Qinjin Jia, Jialin Cui, Haoze Du, Parvez Rashid, Ruijie Xi, Ruochi Li and Edward Gehringer
Complex Conversations: LLMs vs. Knowledge Engineered Conversation-based Assessment – Carol Forsyth, Diego Zapata-Rivera, Edith Aurora Graf and Yang Jiang
Enhancing the Accuracy of Predicting Students Grades in Open-Ended Questions through Adjustments to Attention Weights – Masaki Koike, Hirokazu Kohama, Tsubasa Hirakawa, Takayoshi Yamashita and Hironobu Fujiyoshi
Predicting Response Time of Questions Using Linear Mixed-effects Model – Luyao Peng
Tailored analysis of dropout in UBA distance postgraduate courses: first results – Antonio R. Anaya, Pablo M. Gómez and Ariel Lutenberg
Explainability in Educational Data Mining and Learning Analytics: An Umbrella Review – Sachini Gunasekara and Mirka Saarela
Comparing Clustering Methods in Group-level Test Collusion Detection – Luyao Peng
Ethical Educational Data Processing Differences of Students with Special Needs in Post-Soviet Countries – Ayaz Karimov, Mirka Saarela and Tommi Kärkkäinen
FlexEval: a customizable tool for chatbot performance evaluation and dialogue analysis – S. Thomas Christie, Baptiste Moreau-Pernet, Yu Tian and John Whitmer
Uncertainty-preserving deep knowledge tracing with state-space models – Thomas Christie, Carson Cook and Anna Rafferty
Comparative Analysis of Student Performance Predictions in Online Courses using Heterogeneous Knowledge Graphs – Thomas Trask, Michael Boyle, Ahmed Ali Abdo Abdullah Mubarak, David Joyner and Nick Lytle
Investigating the Dynamic Change of Pre- and In-service Teachers’ Experiences, Attitudes, and Perceptions through CS Autobiography Using Topic Modeling – Shan Zhang, Hai Li, Hongming Li, Anthony F. Botelho and Maya Israel
Exploring Simultaneous Knowledge and Behavior Tracing – Siqian Zhao and Sherry Sahebi
Interpreting Latent Student Knowledge Representations in Programming Assignments – Nigel Fernandez and Andrew Lan
Math Multiple Choice Question Generation via Human-Large Language Model Collaboration – Jaewook Lee, Digory Smith, Simon Woodhead and Andrew Lan
Generating Feedback-Ladders for Logical Errors in Programming using Large Language Models – Hasnain Heickal and Andrew Lan
Auditing an Automatic Grading Model with Reinforcement Learning – Aubrey Condor and Zachary Pardos

Accepted Papers

Long Papers

Industry & Short Papers

Posters & Demos

Announcements