8th Educational Data Mining in Computer Science Education (CSEDM) Workshop
Yang Shi
North Carolina State University
Utah State University
yshi26@ncsu.edu
Peter Brusilovsky
University of Pittsburgh
peterb@pitt.edu
Bita Akram
North Carolina State University
bakram@ncsu.edu
Thomas W. Price
North Carolina State University
twprice@ncsu.edu
Juho Leinonen
Aalto University
juho.2.leinonen@aalto.fi
Kenneth R. Koedinger
Carnegie Mellon University
koedinger@cmu.edu
Andrew Lan
University of Massachusetts Amherst
andrewlan@cs.umass.edu

ABSTRACT

There is a growing community of researchers at the intersection of data mining, AI, and computing education research. The objective of the CSEDM workshop is to facilitate a discussion among this research community, with a focus on how data mining can be uniquely applied in computing education research. For example, what new techniques are needed to analyze program code and CS log data? How do results from CS education inform our analysis of this data? The workshop is meant to be an interdisciplinary event at the intersection of EDM and Computing Education Research. Researchers, faculty, and students are encouraged to share their AI- and data-driven approaches, methodologies, and experiences where data transforms how students learn Computer Science (CS) skills. This full-day hybrid workshop will feature paper presentations and discussions to promote collaboration.

Keywords

Computer Science Education, Educational Data Mining, AI in Education, Learning Analytics

1. WORKSHOP GOALS

Computing is an increasingly fundamental skill for students across disciplines. It enables them to solve complex, real, and challenging problems and make a positive impact on the world. Yet, the field of computing education is still facing a range of problems, from high failure and attrition rates to challenges in training and recruiting teachers to the under-representation of women and students of color.

Advanced learning technologies, which use data and AI to improve student learning outcomes, have the potential to address these problems. However, the domain of CS education presents novel challenges for applying these techniques. CS presents domain-specific challenges, such as helping students effectively use tools like compilers and debuggers and supporting complex, open-ended problems with many possible solutions. CS also offers unique opportunities for developing learning technologies, such as abundant and rich log data, including code traces that capture each detail of how students’ solutions evolved.

These domain-specific challenges and opportunities suggest the need for a specialized community of researchers working at the intersection of AI, data mining, and computing education research. The goal of this Educational Data Mining for Computer Science Education (CSEDM) is to bring this community together to share insights for supporting and understanding learning in the domain of CS using data. This field is nascent but growing, with research in computing education increasingly using data analysis approaches and researchers in the EDM community increasingly studying CS datasets. This workshop will help these researchers learn from each other and develop the growing sub-field of CSEDM.

The workshop will build on seven successful prior CSEDM workshops at:

Each of these workshops was productive and well-attended. Our past in-person workshops have been well attended, and our virtual events have had over 100 people registered and over 70 simultaneous attendees! The proceedings were published in CEUR8 and Zenodo9.

We hope to keep our momentum with an 8th CSEDM Workshop, returning to EDM in 2024. Compared with prior versions of CSEDM, we specifically added a focusing area to include contributions of large language models and generative AI in the workshop paper presentations, which aligns with the theme of EDM 2024.

The CSEDM workshop is funded by the CS-SPLICE project10. The CSEDM workshop will serve as a hub for researchers in the EDM community to discuss potential collaborations and identify EDM challenges in computing education. We plan to provide need-based funding support for participants, covering the cost of lodging on the workshop day and the registration fees.

2. RELEVANT TOPICS

The workshop encourages contributions from the following topics of interest:

We will invite researchers who are interested in further exploring, contributing, collaborating, and developing data- and AI-driven techniques for building educational tools for Computer Science to submit papers on any of these topics.

3. WORKSHOP ORGANIZATION

The workshop will be organized by a team with a history of CSEDM research:

Yang Shi is a Ph.D. student and Goodnight Doctoral Fellow at North Carolina State University. He has been working towards building data-driven methods for representing program code to enhance the ability of Intelligent Tutoring Systems and benefit student modeling processes for computing education. With a focus on DM/ML approaches applied to CS education, his research interests also include Programming Language Processing, Software Analysis, and Deep Learning. He has been serving as a program committee (PC) member in conferences across multiple disciplines, including EDM, LAK, KDD, AAAI, EAAI, SIGCSE, and ITICSE.

Peter Brusilovsky is a Professor of Information Science and Intelligent Systems at the University of Pittsburgh, where he also directs the Personalized Adaptive Web Systems (PAWS) lab. He has been working in the field of adaptive educational systems, user modeling, and intelligent user interfaces for more than 30 years. He published numerous papers and edited several books on adaptive hypermedia and the adaptive Web. He is a founder of CS-SPLICE and has advanced research and infrastructure for CSEDM.

Bita Akram is a research assistant professor with the Department of Computer Science at North Carolina State University. Her research lies at the intersection of artificial intelligence and advanced learning technologies with its application on improving access and quality of CS Education. She been actively developing data-driven approaches for assessing students’ CS competencies as demonstrated through their interactions with educational programming activities. She has served as the organizer and program committee for venues focused on educational data mining including EDM and CSEDM.

Thomas Price is an Assistant Professor of Computer Science at North Carolina State University. His primary research goal is to develop learning environments that automatically support students through AI and data-driven help features. His work has focused on the domain of computing education, where he has developed techniques for automatically generating programming hints and feedback for students in real-time by leveraging student data. He has helped organized a number of efforts at the intersection of AIED, Data Mining and CS Education, including the CS-SPLICE working group on programming snapshot representation and prior CSEDM and CS-SPLICE workshops.

Juho Leinonen is an Academy Research Fellow at Aalto University. His research focuses on creating better insight into students’ learning with fine-grained learning analytics; using educational technology and artificial intelligence for personalizing course content; and using learnersourcing to create ample learning opportunities for distinct student needs. He has served on the program committee of both computing education focused and educational data mining focused conferences.

Ken Koedinger is a professor of Human Computer Interaction and Psychology at Carnegie Mellon University. Dr. Koedinger has an M.S. in Computer Science, a Ph.D. in Cognitive Psychology, and experience teaching in an urban high school. His multidisciplinary background supports his research goals of understanding human learning and creating educational technologies that increase student achievement. His research has contributed new principles and techniques for the design of educational software and has produced basic cognitive science research results on the nature of student thinking and learning. Koedinger directs LearnLab, which started with 10 years of National Science Foundation funding and is now the scientific arm of CMU’s Simon Initiative. LearnLab builds on the past success of Cognitive Tutors, an approach to online personalized tutoring that is in use in thousands of schools and has been repeatedly demonstrated to increase student achievement, for example, doubling what algebra students learn in a school year. He was a co-founder of CarnegieLearning, Inc. that has brought Cognitive Tutor based courses to millions of students since it was formed in 1998, and leads LearnLab, now the scientific arm of CMU’s Simon Initiative. Dr. Koedinger has authored over 250 peer-reviewed publications and has been a project investigator on over 45 grants. In 2017, he received the Hillman Professorship of Computer Science and in 2018, he was recognized as a fellow of Cognitive Science.

Andrew (Shiting) Lan is an assistant professor in the Manning College of Information and Computer Sciences, University of Massachusetts Amherst. Before that, he was a postdoctoral research associate in the EDGE Lab at the Department of Electrical Engineering, Princeton University, and received his M.S. and Ph.D. degrees in Electrical and Computer Engineering in May 2014 and May 2016, respectively, from the Digital Signal Processing (DSP) group at Rice University. His research focuses on the development of artificial intelligence (AI) and especially natural language processing (NLP) methods to enable scalable and effective personalized learning in education, covering areas such as learner modeling, personalization, content generation, and human-in-the-loop AI.

3.1 Program Committee

The 8th CSEDM Workshop’s program committee will draw from members of prior program committees, including:

4. CALL FOR PARTICIPATION

We will solicit three types of research contributions:

8-page Research Papers: Original, unpublished work, addressing any of the topics of interest above.

6-page Position Papers or Work-in-progress Papers:

2-page Descriptions of CS Tools/Datasets/Infrastructure: Researchers will present their work at CSEDM in a conversational format. Presentations might include:

4.1 Timeline

The CFP will be released as soon as the workshop is accepted. An approximate timeline is as follows:

5. WORKSHOP ACTIVITIES

The workshop will be a full day workshop. It will primarily consist of paper presentations and discussions to facilitate collaboration. Interactive sessions include multiple parallel, short presentations, where participants can float around to the presentations they are interested in, similar to a poster session (see Section 6 for details on remote attendees).

A tentative schedule is as follows:

6. PLANS FOR SUPPORTING REMOTE ATTENDEES

We propose a hybrid format, where participants, including presenters, can participate remotely as needed. If the conference is held fully online, we can switch to an online format, as we did in CSEDM 2022. We will take the following steps to support remote attendees:

7. SOLICITATION PLAN

Building on our growing network of contributors to prior workshops, we intend to solicit participation on the workshop through the following mailing lists and research networks:

We will also reach out to prior contributors to CSEDM Workshops to solicit additional submissions.

1http://sites.google.com/asu.edu/csedm-ws-edm-2018/

2http://sites.google.com/asu.edu/csedm-ws-lak-2019/

3http://sites.google.com/asu.edu/csedm-ws-aied-2019/

4http://sites.google.com/ncsu.edu/csedm-ws-edm-2020/

5http://sites.google.com/ncsu.edu/csedm-workshop-edm21/

6http://sites.google.com/ncsu.edu/csedm-workshop-edm22/

7http://sites.google.com/ncsu.edu/csedm-workshop-lak23/

8http://ceur-ws.org/Vol-3051/

9http://zenodo.org/communities/csedm23/

10http://cssplice.github.io/