• Home
  • Schedule
  • Papers
  • Keynotes
  • Workshops
  • Master Page
  • Help

The CommonLit Ease of Readability (CLEAR) Corpus

Scott Crossley, Aron Heintz, Joon Choi, Jordan Batchelor, Mehrnoush Karimi, Agnes Malatinszky

Jun 30, 2021 20:40 UTC+2 — Session PS1 — Gather Town

Keywords: Corpus Linguistics, Text Readability, Natural Language Processing

Abstract Paper
Abstract: In this paper, we introduce the Anonymous Ease of Readability (AEAR) corpus. The corpus provides researchers within the educational data mining community with a resource from which to develop and test readability metrics and to model text readability. The AEAR corpus has a number of improvements over previous readability corpora include size (N = ~5,000 reading excerpts), the breadth of the excerpts available, which cover over 250 years of writing in two different genres, and the readability criterion used (teachers’ ratings of text difficulty for their students). This paper discusses the development of the corpus and presents reliability metrics as well as initial analyses of readability.

Poster

Back to Top

© EDM 2021 Organization Committee