Introduction to the Proceedings

Preface

The Indian Institute of Science is proud to host the fully in-person sixteenth iteration of the International Conference on Educational Data Mining (EDM) during July 11-14, 2023. EDM is the annual flagship conference of the International Educational Data Mining Society.

The theme of this year’s conference is “Educational data mining for amplifying human potential.” Not all students or seekers of knowledge receive the education necessary to help them realize their full potential, be it due to a lack of resources or lack of access to high quality teaching. The dearth in high-quality educational content, teaching aids, and methodologies, and non-availability of objective feedback on how they could become better teachers, deprive our teachers from achieving their full potential. The administrators and policy makers lack tools for making optimal decisions such as optimal class sizes, class composition, and course sequencing. All these handicap the nations, particularly the economically emergent ones, who recognize the centrality of education for their growth. EDM-2023 has striven to focus on concepts, principles, and techniques mined from educational data for amplifying the potential of all the stakeholders in the education system.

The spotlights of EDM-2023 include:

The keynote speakers are: Sihem Amer-Yahia (CNRS, France), Ayelet Baram-Tsabari (Israel Institute of Technology, Israel), Anand Deshpande (Persistent Systems, India), Hiroaki Ogata (Kyoto University, Japan), and Jeffrey D. Ullman (Stanford University, Turing Laureate, USA). We are honoured to have them as keynote speakers. Cristina Conati (University of British Columbia, Canada) is the plenary speaker, honoured for winning the 2022 Prof. Ram Kumar EDM Test of Time Paper Award, teaming up with Saleema Amershi (Microsoft Research, USA). D.N. Prahlad (Surya Soft, India) is the banquet speaker.

The programme features five tutorials and four panel sessions. The tutorials are: (1) Core methods in EDM; (2) Introduction to neural networks and uses in EDM; (3) Learning through Wikipedia and generative AI technologies; (4) Data efficient machine learning for educational content creation; and (5) How to open science: Promoting principles and reproducibility processes within the EDM community. The panels are: (1) Turing prize worthy research problems in EDM; (2) MOOCs: Hype or transformative force for amplifying human potential?; (3) Education in the age of generative AI; (4) Indian national education policy: EDM opportunities.

The venue is the sylvan campus of the premier research and education institution of India, the Indian Institute of Science. The host city, Bengaluru, aka Bangalore, is known variously as the Silicon Valley of India, Hi-tech industry capital of India, and the startup capital of India. It is also famous for its historical and cultural roots.

EDM 2023 received 68 submissions to the full papers track (10 pages), 36 to the short papers track (6 pages), and 15 to the poster and demo track (4 pages). The program committee accepted 18 full papers, 11 short papers, and 10 posters. The conference also provided a venue for selected papers from the Journal of Educational Data Mining to be presented to a live audience.

EDM 2023 also continued its tradition of providing opportunities for young researchers to present their work and receive feedback from their peers and senior researchers. The doctoral consortium this year features 10 such presentations.

A highlight of EDM 2023 is the tremendous geographic and gender diversity in the choice of the functional chairs and keynote speakers. It has a unique Ambassador program for young researchers and students to interact with distinguished delegates. A proud achievement of EDM 2023 is that it is providing financial support to nearly fifty first time attendees from developing nations.

We thank the sponsors of EDM 2023 for their generous support: Indian Institute of Science; Prof. Ram Kumar Memorial Foundation; Atria University; Duolingo; Accel Ventures; Infosys Technologies; Playpower Labs; Metals CMU; Carnegie Learning; Seekh; Google research; Microsoft Research India; Talent Sprint. We thank all the authors who submitted their work and the program committee members and reviewers for their expert inputs. We thank the various functional chairs for their leadership that made this conference possible. And, a big Thank You to the local arrangements committee which made this event memorable.

Mingyu Feng WestEd Program Chair
Tanja Käser EPFL Program Chair
Partha Talukdar Google Research and Indian Institute of Science Program Chair
Rakesh Agrawal Data Insights Laboratories General Chair
Y. Narahari Indian Institute of Science General Chair
Mykola Pechenizkiy Eindhoven University of Technology General Chair

July 10th, 2023
Bengaluru, India, IN

Organizing Committee

General Chairs

Program Chairs

Diversity and Inclusion Chairs

Industry Track Chairs

Poster Track Chairs

Demo Track Chairs

Doctoral Consortium Chairs

JEDM Track Chairs

Workshop Chairs

Awards Chairs

Scholarship Chairs

Publicity Chairs

Web Chairs

Proceedings Chairs

Sponsorship Chairs

Local Arrangements Chairs

Local Organizing Committee

IEDMS Officers

Tiffany Barnes, President North Carolina State University, US
Anna Rafferty, Treasurer Carleton College, US

IEDMS Board of Directors

Ryan Baker University of Pennsylvania, US
Mingyu Feng WestEd, US
Neil Heffernan Worcester Polytechnic Institute, US
Sharon Hsiao Santa Clara University, US
Tanja Käser EPFL, CH
Kenneth Koedinger Carnegie Mellon University, US
Kalina Yacef University of Sydney, AU

Area Chairs

Giora Alexandron Weizmann Institute of Science
Gautam Biswas Vanderbilt University
Anthony F. Botelho University of Florida
Alex Bowers Columbia University
Christopher Brooks University of Michigan
Alexandra Cristea Durham University
Carol Forsyth Educational Testing Service
Dragan Gasevic Monash University
Sharon Hsiao Santa Clara University
Sébastien Lallé Sorbonne University
Andrew Lan University of Massachusetts Amherst
Collin Lynch North Carolina State University
Jaclyn Ocumpaugh University of Pennsylvania
Zach Pardos University of California, Berkeley
Philip I. Pavlik Jr. University of Memphis
Anna Rafferty Carleton College
Cristobal Romero Department of Computer Sciences and Numerical Analysis
Jonathan Rowe North Carolina State University
Shaghayegh Sahebi University at Albany – SUNY
Olga C. Santos aDeNu Research Group (UNED)
Kalina Yacef University of Sidney

Program Committee

Bita Akram North Carolina State University
Vincent Aleven Carnegie Mellon University
Laura Allen University of New Hampshire
Claudia Antunes Universidade de Lisboa
Jose Azevedo P.PORTO / ISCAP – POLITÉCNICO DO PORTO
Roger Azevedo University of Central Florida
Ryan Baker University of Pennsylvania
Abhinava Barthakur University of South Australia
Prateek Basavaraj American Association of State Colleges and Universities
Tanmay Basu Indian Institute of Science Education and Research Bhopal
Nathaniel Blanchard Colorado State University
Geoffray Bonnin Université de Lorraine – LORIA
Jesus G. Boticario UNED
Julien Broisin Université Toulouse 3 Paul Sabatier – IRIT
Armelle Brun LORIA – Université de Lorraine
Paulo Carvalho Carnegie Mellon University
Guanliang Chen Monash University
Irene-Angelica Chounta University of Duisburg-Essen
Linda Corrin Deakin University
Evandro Costa Computing Institute, Federal University of Alagoas
Carrie Demmans-Epp University of Alberta
Michel Desmarais Ecole Polytechnique de Montreal
Spyridon Doukakis Ionian University
Jeremiah Folsom-Kovarik Soar Technology, Inc.
Sabine Graf Athabasca University
Julio Guerra University of Pittsburgh
Ella Haig School of Computing, University of Portsmouth
Jiangang Hao Educational Testing Service
Neil Heffernan Worcester Polytechnic Institute
Arto Hellas Aalto University
Erik Hemberg ALFA
Martin Hlosta The Swiss Distance University of Applied Sciences
Paul Hur University of Illinois at Urbana-Champaign
Sébastien Iksal LIUM – Le Mans Université
Paul Salvador Inventado California State University Fullerton
Seiji Isotani University of Sao Paulo
Vladimir Ivančević University of Novi Sad, Faculty of Technical Sciences
Lan Jiang University of Illinois at Urbana-Champaign
Yang Jiang Educational Testing Service
Jina Kang University of Illinois Urbana-Champaign
Kenneth Koedinger Carnegie Mellon University
Irena Koprinska The University of Sydney
Sotiris Kotsiantis University of Patras
Juho Leinonen Aalto University
James Lester North Carolina State University
Qi Liu University of Science and Technology of China
Yu Lu Beijing Normal University
Ivan Luković University of Novi Sad
Aditi Mallavarapu Digital Promise and University of Pittsburgh
Mirko Marras University of Cagliari
Noboru Matsuda North Carolina State University
Ahmad Mel IDLab, Ghent University
Victor Menendez-Dominguez Universidad Autónoma de Yucatán
Agathe Merceron Berliner Hochschule für Technik, Univ. of Applied Sciences
Donatella Merlini Università di Firenze
Caitlin Mills University of Minnesota
Tsunenori Mine Kyushu University
Adway Mitra IIT Kharagpur
Tanja Mitrovic University of Canterbury, Christchurch
Pedro Manuel Moreno-Marcos Universidad Carlos III de Madrid
Bradford Mott North Carolina State University
Matthew Myers University of Delaware
Roger Nkambou Université du Québec à Montréal
Teresa Ober Educational Testing Service
Andrew Olney University of Memphis
Tounwendyam F. Ouedraogo Université Norbert Zongo
Shalini Pandey University of Minnesota
Luc Paquette University of Illinois at Urbana-Champaign
Abelardo Pardo University of South Australia
Jasabanta Patro IISER Bhopal
Niels Pinkwart Humboldt-Universität zu Berlin
Paul Stefan Popescu University of Craiova
Sasha Poquet Technical Univesity of Munich
Thomas Price North Carolina State University
David Pritchard Massachusetts Institute of Technology
Ramkumar Rajendran IIT Bombay
Steven Ritter Carnegie Learning, Inc.
Maria Mercedes T. Rodrigo Ateneo de Manila University
Ido Roll Technion – Israel Institute of Technology
José Raúl Romero University of Cordoba
Daniela Rotelli Università di Pisa
Vasile Rus The University of Memphis
Sreecharan Sankaranarayanan National Institute of Technology Karnataka, Surathkal
Petra Sauer Beuth University of Applied Sciences
Ana Serrano Mamolar Universidad de Burgos
Yang Shi North Carolina State University
Antonette Shibani University of Technology, Sydney
Atsushi Shimada Kyushu University
Stefan Slater Teachers College
Sergey Sosnovsky Utrecht University
Balaji Vasan Srinivasan Adobe Research Big Data Experience Lab, Bangalore
Jun-Ming Su National University of Tainan
Ling Tan Australian Council for Educational Research
Khushboo Thaker University of Pittsburgh
Stefan Trausan-Matu University Politehnica of Bucharest
Anouschka van Leeuwen Utrecht University
Oswaldo Velez-Langs Universidad de Cordoba
Rémi Venant Le Mans Université – LIUM
Tuyet-Trinh Vu SOICT-HUST
Shuai Wang Shanghai Jiaotong University
Stephan Weibelzahl Private University of Applied Sciences Göttingen
Jacob Whitehill Worcester Polytechnic Institute
Beverly Park Woolf University of Massachusetts
Amelia Zafra Gómez Department of Computer Sciences and Numerical Analysis
Diego Zapata-Rivera Educational Testing Service
Yingbin Zhang University of Illinois at Urbana-Champaign
Wenbin Zhang Michigan Technological University
Jia Zhu Florida International University
Amal Zouaq Ecole Polytechnique de Montréal

Sponsors

Diamond


image image image
image image image
image image

Gold


image image

Silver


image image
image image

Keynotes

The Gradiance Automated Homework System

Jeffrey D. Ullman, Turing Laureate, Stanford W. Ascherman Professor of Computer Science (Emeritus), Stanford University, US

We shall describe a free automated homework system and in particular the way it tries to combat cheating and its method for giving guidance as well as assessment. Central to this effort is the idea of a “root question,” which is a way of phrasing multiple-choice questions in a way that enables students with incorrect answers to be given advice and then take the same homework again, without eventually discovering the correct answers by process of elimination.

Towards AI-Powered Data-Informed Education

Sihem Amer-Yahia, CNRS Research Director, FR

The Covid-19 health crisis has seen an increase in the use of digital work platforms from videoconferencing systems to MOOC-type educational platforms and crowdsourcing and freelancing marketplaces. These levers for sharing knowledge and learning constitute the premises of the future of work. Educational technologies coupled with AI hold the promise of helping learners and teachers. However, they are still limited in terms of social interactions, user experience and learning opportunities. I will describe research at the intersection of data-informed recommendations and education theory and conclude with ethical considerations in building educational platforms.

LEAF: Learning and Evidence Analytics Framework in Japan: Connecting Researchers, Practitioners and Policy-makers

Hiroaki Ogata, Professor at Kyoto University, JP

The LEAF system is a Learning and Evidence Analytics infrastructure that supports the collection, analysis, and utilization of learning logs. LEAF system consists of a Learning Management System (LMS), an eBook reader (BookRoll), Learning Record Store (LRS), and a Learning Analytics tool (Log Palette). BookRoll works as a behavior sensor and records student log data. Log Palette analyzes and visualizes the log data obtained from BookRoll and LMS. The log data can be further used for interactive lectures, reflection, recommendations, and class improvement. LEAF system has been used in over 120 educational institutions, from elementary to higher education, within eight countries and regions. Our goal is to scientifically analyze those data, support teachers and students, and transform from “education and learning based on their experiences” into “education based on data and evidence.” This talk will introduce: (1) research for supporting data-and-evidence informed education, (2) practices of data-informed education with LEAF in K12 schools and universities, and (3) policies for educational data utilization in Japan.

Challenges and Opportunities in Higher Education

Anand Deshpande, Founder and Chairman, Persistent Systems, IN

As a practitioner and recruiter of college graduates, I will share perspectives of the changes in job market and how students and colleges can explore new ways to thrive in the ever changing world.

Communicating science for amplifying human potential in a post-truth era

Ayelet Baram-Tsabari, Professor at Israel Institute of Technology, IL

Science is a communication-driven endeavor – without it, we cannot build on previous research, collaborate with practitioners, or convince policymakers and stakeholders, such as parents and students, to use the resulting technology or its outcomes. In this talk, we’ll explore what science communication is and why it’s crucial? What do people know, and how is that related to what they do? How do we decide who to believe in? How is our worldview, the things we love and value, related to what we know? Do people need to know what they are talking about to form an opinion? And more specifically, what do people know about AI, and how can we communicate the results of AI research to diverse audiences? Finally, we will discuss what can be done so that people can make informed decisions about scientific issues. To put it more practically: what works and what doesn’t when it comes to communicating science to diverse audiences?

The Prof. Ram Kumar Educational Data Mining Test of Time Award:
Combining unsupervised and supervised classification to build user models for exploratory learning environments (ELE)

Cristina Conati, Professor at University of British Columbia, CA

In this talk, I will present the approach we proposed in the paper recipient of the “The Prof. Ram Kumar Educational Data Mining Test of Time Award” for building data-driven user models that can drive real-time support to students interacting with exploratory learning environments (ELEs). I will summarize the results we obtained in the past 12 years in applying extensions of this approach to a variety of ELEs, moving to discussing lessons learned and opportunities for future research.

JEDM Presentations

Using Demographic Data as Predictor Variables: a Questionable Choice

Ryan S. Baker University of Pennsylvania
Lief Esbenshade Google
Jonathan Vitale Google
Shamya Karumbaiah University of Wisconsin

Predictive analytics methods in education are seeing widespread use and are producing increasingly accurate predictions of students’ outcomes. With the increased use of predictive analytics comes increasing concern about fairness for specific subgroups of the population. One approach that has been proposed to increase fairness is using demographic variables directly in models, as predictors. In this paper we explore issues of fairness in the use of demographic variables as predictors of long-term student outcomes, studying the arguments for and against this practice in the contexts where this literature has been published. We analyze arguments for the inclusion of demographic variables, specifically claims that this approach improves model performance and charges that excluding such variables amounts to a form of ‘color-blind’ racism. We also consider arguments against including demographic variables as predictors, including reduced actionability of predictions, risk of reinforcing bias, and limits of categorization. We then discuss how contextual factors of predictive models should influence case-specific decisions for the inclusion or exclusion of demographic variables and discuss the role of proxy variables. We conclude that, on balance, there are greater benefits to fairness if demographic variables are used to validate fairness rather than as predictors within models.

Using Auxiliary Data to Boost Precision in the Analysis of A/B Tests on an Online Educational Platform: New Data and New Results

Adam C. Sales Worcester Polytechnic Institute
Ethan B. Prihar Worcester Polytechnic Institute
Johann A. Gagnon-Bartsch University of Michigan
Neil T. Heffernan Worcester Polytechnic Institute

Randomized A/B tests within online learning platforms represent an exciting direction in learning sciences. With minimal assumptions, they allow causal effect estimation without confounding bias and exact statistical inference even in small samples. However, often experimental samples and/or treatment effects are small, A/B tests are underpowered, and effect estimates are overly imprecise. Recent methodological advances have shown that power and statistical precision can be substantially boosted by coupling design-based causal estimation to machine-learning models of rich log data from historical users who were not in the experiment. Estimates using these techniques remain unbiased and inference remains exact without any additional assumptions. This paper reviews those methods and applies them to a new dataset including over 250 randomized A/B comparisons conducted within ASSISTments, an online learning platform. We compare results across experiments using four novel deep-learning models of auxiliary data and show that incorporating auxiliary data into causal estimates is roughly equivalent to increasing the sample size by 20% on average, or as much as 50-80% in some cases, relative to t-tests, and by about 10% on average, or as much as 30-50%, compared to cutting-edge machine learning unbiased estimates that use only data from the experiments. We show that the gains can be even larger for estimating subgroup effects, hold even when the remnant is unrepresentative of the A/B test sample, and extend to post-stratification population effects estimators.

Best Paper AIED 2022 Presentation

CurriculumTutor: An Adaptive Algorithm for Mastering a Curriculum

K. M. Shabana Indian Institute of Technology Palakkad
Chandrashekar Lakshminarayanan Indian Institute of Technology Madras
Jude K. Anil Indian Institute of Technology Palakkad

An important problem in an intelligent tutoring system (ITS) is that of adaptive sequencing of learning activities in a personalised manner so as to improve learning gains. In this paper, we consider intelligent tutoring in the learning by doing (LbD) setting, wherein the concepts to be learned along with their inter-dependencies are available as a curriculum graph, and a given concept is learned by performing an activity related to that concept (such as solving/answering a problem/question). For this setting, recent works have proposed algorithms based on multi-armed bandits (MAB), where activities are adaptively sequenced using the student response to those activities as a direct feedback. In this paper, we propose CurriculumTutor, a novel technique that combines a MAB algorithm and a change point detection algorithm for the problem of adaptive activity sequencing. Our algorithm improves upon prior MAB algorithms for the LbD setting by (i) providing better learning gains, and (ii) reducing hyper-parameters thereby improving personalisation. We show that our tutoring algorithm significantly outperforms prior approaches in the benchmark domain of two operand addition up to a maximum of four digits.