Core Methods in Educational Data Mining

July 11, 2023

Ryan Baker

Abstract: This half-day tutorial will introduce core methods in educational data mining (EDM). Attendees will learn what the most common methods are and what purposes they are used for, as well as how the methods all fit together into the conceptual structure introduced by Baker and Yacef (2009) and revised in Baker & Siemens (2014, 2022). The tutorial will discuss how EDM differs from traditional statistical and psychometric approaches, and when using EDM is and is not warranted. Key considerations on how to use methods appropriately and validly will also be presented.

This tutorial will not review the actual code use of these methods in Python, R, or other packages, topics which there are extensive online resources for. However, attendees of this tutorial will emerge with the conceptual understanding necessary to select and use methods appropriately within EDM, and will be prepared to take other EDM tutorials or online courses (including the free course “Big Data and Education” on edX).

Introduction to Neural Networks and Uses in EDM

July 11, 2023

Agathe Merceron

Ange Adrienne Nyamen Tato

Abstract: In this half-day tutorial, participants first explore the fundamentals of feed-forward neural networks, such as the backpropagation mechanism; the subsequent introduction to the more complex Long Short Term Memory neural networks builds on this knowledge. The tutorial also covers the basics of the attention mechanism, the Transformer neural networks, and their application in education with Deep Knowledge Tracing. There will be some hands-on applications on open educational datasets. The participants should leave the tutorial with the ability to use neural networks in their research. A laptop capable of installing and running Python and the Keras library is required for full participation in this half-day tutorial.

Learning through Wikipedia and Generative AI Technologies

July 14, 2023

Praveen Garimella

Vasudeva Varma

Abstract: This tutorial will examine the use of Wikipedia and generative AI technologies in asynchronous learning environments. Participants will learn about the research on accountable talk and its impact on student learning, as well as the challenges of implementing the learning principles using Wikipedia in an asynchronous settings. The tutorial will also showcase the potential of generative AI technologies, such as chatbots and language models, to facilitate accountable talk and support student-led discussions in asynchronous learning environments. By the end of the tutorial, participants will have a solid understanding of the potential of generative AI technologies to enhance student learning and scale accountable talk in asynchronous learning environments.

Data Efficient Machine Learning for Educational Content Creation

July 14, 2023

Ganesh Ramakrishnan

Ayush Maheshwari

Abstract: Machine Learning has found several use cases in education applications. Specifically, Neural machine translation (NMT) systems (e.g., in educational applications) are socially significant with the potential to help make information accessible to a diverse set of users in multilingual societies. NMT systems have helped translating audio, video and textual content in vernacular languages aiding both students and teachers. However, translation of higher education/technical textbooks/courses necessitate MT systems to adhere to the lexicon of source and target domain.

In this tutorial, we provide insights from our translation ecosystem (https://udaanproject.org) that has helped in translating 100s of diploma and engineering books each in more than 11 Indian languages.

We will provide the audience with the holistic view of:

  1. How to build a domain-specific lexicon in 11 Indian languages using a small seed dictionary by utilising the innate connection across languages
  2. How to build an multilingual NMT model that ingests domain-specific lexicon without affecting the fluency of the predicted sentence
  3. How to build a human-in-the-loop AI post-editing tool that benefits from
    a. complex OCR (Optical character recognition) and layout analysis to preserve bounding boxes in the source document.
    b. and that learns from the user edits and calibrates the output for subsequent occurrences.
  4. What are the insights gathered from translating sample 50 books across 11 languages?

The ecosystem at https://udaanproject.org that will be presented as a tutorial is fueled by several peer reviewed publications (see https://udaanproject.org/Publications), grouped into three.

  1. Translation and Post-Editing
  2. Dictionary Organization and Constraint Induction
  3. Optical Character Recognition (OCR)

Finally, in the annexure, we also present a brief background and impact of this work so far, as acknowledged by AICTE (https://www.aicte-india.org/content/aicte-acknowledgement-udaan-team-iit-bombay-technical-book-writing-scheme), Government of Maharashtra, Bank of Baroda and so on.

How to Open Science: Promoting Principles and Reproducibility Practices within the EDM Community

July 14, 2023

Aaron Haim

Stacy Shaw

Neil Heffernan

Abstract: Across the past decade, open science has increased in momentum, making research more openly available and reproducible. Educational data mining, as a subfield of education technology, has been expanding in scope as well, developing and providing better understanding of large amount of data within education. However, open science and educational data mining do not often intersect, causing a bit of difficulty when trying to reuse methodologies, datasets, analyses for replication, reproduction, or an entirely separate end goal. In this tutorial, we will provide an overview of open science principles and their benefits and mitigation within research. In the second part of this tutorial, we will provide an example on using the Open Science Framework to make, collaborate, and share projects. The final part of this tutorial will go over some mitigation strategies when releasing datasets and materials such that other researchers may easily reproduce them. Participants in this tutorial will gain a better understanding of open science, how it is used, and how to apply it themselves.