Comparison of Learning Behaviors on an e-Book System in 2019 Onsite and 2020 Online Courses

Kawashima, Hiroaki

doi:10.5281/zenodo.6853147

Hiroaki Kawashima

Graduate School of Information Science University of Hyogo, Kobe, Japan

kawashima@gsis.u-hyogo.ac.jp

ABSTRACT

In this study, we compare e-Book log data from onsite face-to-face courses in 2019 and synchronized online courses in 2020 to elucidate the difference in students’ learning behaviors. We focus on short periods before and after class time by considering locational and temporal constraints in onsite courses, i.e., students should be physically presented in a classroom and leave the room after each class. As online courses are free from such constraints, we show that learners’ behavior in 2020 has characteristic patterns before and after class time and that the activity of students in those periods relates to their final grades.

Keywords

Learning analytics, onsite vs. online, e-book system

1. INTRODUCTION

The learning environment has changed drastically due to the COVID-19 pandemic. In particular, online classes have been widely used when the number of infections is large. There are several styles of online classes, such as synchronized and asynchronized [3]. While asynchronized courses make students watch prerecorded videos in an on-demand fashion, synchronized online courses often require students to connect to a video-conferencing system and attend the class online from their homes at a specified class time. In addition, the media transmitted by the video-conferencing system is not limited to video of a lecturer captured by a camera, but a variety of other styles are used; for example, if class materials (e.g., slides) are shared in another way, it is enough to transmit only audio of lecturers’ speech.

How have learning styles changed with the introduction of such synchronized online classes? This study aims to clarify changes in learning behaviors and their relationship to final grades (e.g., A, B, and C) by analyzing browsing log data from 2019 onsite face-to-face classes and 2020 synchronized online classes in university lectures where the e-Book system has been introduced. We focus on the fact that the face-to-face format has locational and temporal constraints that require students to go to the classroom at a specific time and leave the room just after class, whereas the online format frees students from such constraints. From these considerations, in this study, we particularly analyze students’ activity in short periods just before and after class time to understand the difference in learning styles between 2019 and 2020. The contributions of this study are twofold:

We show that onsite face-to-face and online teaching formats cause different patterns of students’ activity during short periods of time (e.g., 15 minutes) before and after class time.
We also show that the activity patterns before and after class time relate to students’ final grades.

2. RELATED WORK

Clickstreams from e-Book systems are often used in the field of learning analytics to provide detailed analysis of how students are viewing course materials [16, 14]. Such log data is used for a wide range of applications, including feedback to instructors during classes [13], grade prediction [10, 2, 8], and summarization of teaching materials [15]. For final grade or score prediction, methods for identifying important features [5] and predicting final grades using various machine learning methods have been proposed [10, 2, 8]. This can also be used for early dropout prediction [2].

Comparisons between onsite (face-to-face) and online classes have been studied in the education of various fields. Aggarwal et al. [1] conducted a randomized study to test the difference in learning effects between onsite and online courses on Biostatistics and Research Ethics and found no significant differences. Jones and Long [6] compared onsite and online students in the same mathematics course from 2005 to 2011 and showed that onsite students performed better for overall ten semesters but were comparable when limited to the last seven semesters. Paul and Jefferson [12] conducted a comparative analysis in an environmental science course from 2009 to 2016 and reported no difference in academic performance between onsite and online formats.

There are some later studies conducted after the onset of the COVID-19 pandemic. For example, Yang et al. [17] examined the viewing time and completion rates of online dental education courses and found that the completion rate was related to the time the content was first visited (within 60 minutes before, further before, or after the start of the class). Amin et al. [3] analyzed the end-of-course surveys in electronic circuit courses from 2007 to 2021 and reported that online classes provided as good as or better learning experiences than onsite classes.

Many of the existing studies mentioned above mainly compared onsite and online courses based on grades/questionnaires or analyzed students’ learning behaviors such as viewing duration of contents and first accessed time. On the other hand, this study aims at finding a new perspective to compare onsite and online students’ behaviors by defining the amount of activity using detailed operation logs of an e-Book system in short periods of time before and after classes.

3. LECTURE COURSES AND DATASET

3.1 Courses and Teaching Formats

The dataset used in this study was open for the Data Challenge at the International Conference on Learning Analytics & Knowledge 2022 [9, 4, 7] and provided on a request basis. It consists of four courses whose course IDs are A-2019, A-2020, B-2019, and B-2020, where "A" and "B" denote the subjects of the courses, and the numbers 2019 and 2020 denote the offered year. For simplicity, we denote the pair of courses X-2019 and X-2020 as "course X" in the following. Each of course A and course B covers the same topics for both years (e.g., B-2019 and B-2020 delt with the same topics).

In 2019, the teaching format was onsite face-to-face, and each student operated the e-Book system in a classroom using his or her own laptop computer while listening to a slide presentation by a lecturer on the podium. In 2020, on the other hand, the teaching format was synchronized online. Lecturers gave audio-only lectures during each class time, and students used their own PCs to operate the e-Book system at home or from other locations.

3.2 Class Weeks and Materials

While the topics are essentially the same within each of courses A and B, the number of weeks and materials differs over the two years. First, both courses A and B were offered for eight weeks in 2019 and seven weeks in 2020. In course A, the content of Weeks 5 and 6 in 2019 was taught in Week 5 in 2020, and in course B, the content of Weeks 2 and 3 in 2019 was taught in Week 2 in 2020. In addition, the course materials were updated in 2020 to some extent. In course A-2019, a summarized version of the materials was also distributed on the e-Book system. Basic information about each course is shown in Table 1.

Table 1: Basic information of the courses in the dataset.
Course ID	Weeks	# of students	Class time
A-2019	8	50	8:40-12:00 (Mon.)
A-2020	7	62	8:40-12:00 (Mon.)
B-2019	8	164	14:50-16:20 (Tue.)
B-2020	7	93	14:50-16:20 (Tue.)

3.3 e-Book Log Data

The data used in this study are students’ operation logs obtained through an e-Book system [9, 4]. Students browse provided class materials (e.g., slides) by performing, for example, page forwarding and marking operations. Each line of the log data records what operation is performed by which student on which page of which course material, along with a timestamp consisting of date, hour, minute, and second. Types of operations include page transitions, adding notes and markers, and bookmarking. However, as described later, this paper focuses on the number of operations, which we also refer to as operation count or frequency, as the amount of activity. All data are anonymized, and timestamps are slightly blurred on a scale of seconds while maintaining order from the original.

Each student’s final grade is recorded as A, B, C, D, or F in every course, where A is the highest grade, and F is the lowest grade. The records are also included in the dataset.

4. ANALYSIS

4.1 Visualization of Activity by Time of Day

To examine the overall trend of students’ activity for each course, we first visualize the total activity at each hour of each day of the week. We here define "activity" as the frequency of operations without distinguishing operation types, where page transition operations (moving to the previous or next page) comprise the majority of activities.

Figure 1 shows the visualization of total activity in each hour. The color in each time slot shows the total frequency of operations through the course period. Note that the frequency is normalized by dividing by the number of students in the course. As for the colormap range, we set 300 as the highest value to be covered (colored by yellow). The figure shows that the time slots with the highest activity correspond to the class time of each course.

Figure 1: Visualization of activity in each day of the week (0: Monday, 6: Sunday). Active periods correspond to the class time of each course. See also Table 1.

When the value on a particular grid is significantly high, the other grids are buried in this kind of heatmap visualization. Therefore, we visualize the same data with different colormap that ranges from 0 to 40 in Fig 2, i.e., each time slot with yellow color has an average of 40 operations or more per student. We can see in this figure that the amount of activity on Sundays is relatively high in both 2019 and 2020 for course A. This may be due to the deadline setting of assignments. In addition, both course A and course B have more activity outside of class time on the day of the class in 2020 compared to 2019.

Figure 2: Visualization of activity in each day of the week (0: Monday, 6: Sunday) with colormap ranges from 0 to 40.

Figure 3: Activity patterns around the class time in courses A and B. Vertical lines show the start and end of class.

To detail the activity patterns around the class time, we aggregated activity with 10-minute intervals during the class time with a margin of two hours before and after the class, which is shown in Fig. 3. Compared to 2019, the patterns in 2020 are characterized by before and after the class time. In A-2020 and B-2020, activity continues after the class for a certain period, which we refer to as "post-class activity." In addition, in course B-2020, activity increases before the class starts, namely "pre-class activity" exists. On the other hand, A-2020 has no significant pre-class activity, which may be because course A was scheduled in the early morning while course B was taught in the afternoon.

4.2 Relationship to Final Grades

To further investigate the pre-/post-class activity, we divided students into groups based on their final grades and examined the differences in the amount of pre-/post-class activity. Figure 4 shows the average pre-/post-class activity in each grade level. Specifically, we first calculated the pre-/post-class activity of each student based on the total operation frequency during a specific period (i.e., 15min before or after class) on all class days. We then calculated the average students’ pre-/post-class activity in each grade level. Error bars indicate 90% confidence intervals using $t$ -distributions. We can find that, in 2020, the amount of post-class activity tends to be higher the better the final grade.

Figure 4: Average pre-/post-class activity (the number of operations) in each grade level. Blue bars: pre-class activity; orange bars: post-class activity. Error bars show 90% confidence intervals.

For each course, we conducted a statistical test for the difference in the amount of post-class activity among grade levels. The results ( $p$ -values) of the Kruskal-Wallis tests are shown in Table 2, indicating significant differences among grade levels in 2020 for both courses A and B. While the computation of the confidence intervals in Fig. 4 assumes that each population is normally distributed, the actual activity distribution tends to be biased toward lower values and often contains outliers. We, therefore, used nonparametric tests here and in the following.

Table 2: Results of the Kruskal-Wallis tests for post-class activity. The rightmost column shows the sample size of each grade.
Course ID	$p$ -value	$n$ (A, B, C, D, F)
A-2019	0.7023	(24, 6, 4, 6, 10)
A-2020	0.0067	(22, 24, 6, 3, 7)
B-2019	0.4088	(26, 104, 30, 2, 2)
B-2020	0.0078	(37, 38, 12, 2, 4)

In order to examine in detail which grade groups have the gap of activity, we conducted the Mann-Whitney U-test separately for divided two-group pairs: grade group {A} vs. {B, C, D, F}, {A, B} vs. {C, D, F}, and {A, B, C} vs. {D, F}. The reason for grouping the grade levels in this way is that the criteria for grading may differ from course to course; for example, two grade levels in one course may correspond to a single grade level in another course. We did not include only grade F as a group because of the small sample size.

The results are shown in Table 3. We can find from the table that some grade groups have significant gaps in the post-class activity. For example, in course A-2020, there are significant differences between grade group {A} and {B, C, D, F}, which corresponds to the gap in orange-bar heights in Fig. 4. In course B-2020, between {A, B} and {C, D, F}.

Table 3: Results ( $p$ -values) of the Mann-Whitney U-tests for the post-class activity. Vertical lines in the column names show how the grade levels were divided into two groups.
Course ID	A \| BCDF	AB \| CDF	ABC \| DF
A-2019	0.6221	0.2676	0.5449
A-2020	0.0005	0.0139	0.0317
B-2019	0.8662	0.0786	0.3716
B-2020	0.4232	0.0020	0.0023

5. DISCUSSION

The results of the analyses in the previous section suggest that the change of teaching format from onsite face-to-face classes in 2019 to synchronized online classes in 2020 frees students from constraints of class time and locations. As a result, some students continue to examine class materials even after class time, observed as the post-class activity. Such students’ behavior on a relatively short time scale (e.g., 15 minutes after class) may provide important clues for understanding students’ learning in online environments.

The pre-/post-class activity might be an indicator of students’ self-regulation [11] in some sense. However, it should be examined with further study combining other clues, such as learning behaviors (e.g., the frequency of reviewing class materials), learning strategies, and motivations.

Some limitations exist in our analyses in the previous section. One is the effect of class attendance. As we computed the amount of activity from the total frequency of operations in a specific time period, it also contains information on attendance or absences. For example, if a student is absent from a class, post-class activity may also become close to zero on that day. While we decided to include such effects in the present study, in order to investigate pre-/post-class activity conditioned on "attended" students, we need further analysis by estimating whether each student attended or not on each class day.

Another limitation is that the objective of this study is not on the utilization of features, e.g., improvement of prediction accuracy of grade levels. Therefore, it is unclear whether features extracted from the pre-/post-class activity are useful, and other activities (e.g., in-class activity) can be more critical, for example, for final-grade prediction. To directly compare the importance of pre-class, post-class, and in-class activity, we trained gradient boosting classifiers, widely used for various data challenges including educational data.

Figure 5 shows the feature importance obtained through the model training. Here, we used the LightGBM implementation. Note that our objective here is not the evaluation of prediction accuracy itself but a brief comparison of feature importances. We, therefore, trained a model from all data in each course. We also constrained model complexity to a certain degree with the hyperparameters (the maximum tree depth: 3, the minimum number of data in a leaf: 10, and the number of iterations: 20) to avoid too much over-fitting. As for the in-class activity feature, we computed operation frequency during class time in the same manner as pre-/post-class activation.

Figure 5: Feature importances computed by the gain of gradient boosting models.

From the figure, in-class activity seems to be the most crucial feature for the grade prediction task throughout 2019 and 2020. Meanwhile, it is worth noting that the importance of pre-/post-class activity substantially increased in 2020 compared to 2019.

While appropriate evaluation of the grade-prediction task requires appropriate cross-validation with a larger amount of data, the results of this study show some potential to design and combine various detailed features in out-class time periods in the context of online courses.

6. CONCLUSION

This study analyzed student behavior in 2019 and 2020 before and during the COVID-19 pandemic, based on operation logs obtained through an e-Book system, for two courses whose teaching format changed from face-to-face to online in the two years. Because online classes do not have the restriction of leaving the lecture room after class, activity after class, which was not seen in 2019, was often observed in 2020. In addition, our study suggested that the activity has some relation to students’ final grades.

Acknowledgements

This work was supported by JSPS KAKENHI Grant Number JP19H04226.

7. REFERENCES

R. Aggarwal, N. Gupte, N. Kass, H. Taylor, J. Ali, A. Bhan, A. Aggarwal, S. D. Sisson, S. Kanchanaraksa, J. McKenzie-White, J. McGready, P. Miotti, and R. C. Bollinger. A comparison of online versus on-site training in health research methodology: A randomized study. BMC Medical Education, 11(1):1–10, 2011.
G. Akçapınar, M. N. Hasnine, R. Majumdar, B. Flanagan, and H. Ogata. Developing an early-warning system for spotting at-risk students by using eBook interaction logs. Smart Learning Environments, 6(1), 2019.
M. Amin, B. R. Sinha, and P. P. Dey. Online versus onsite teaching performance analysis of an introductory electrical circuit class. Asian Journal of Education and e-Learning, 9(4), 2021.
B. Flanagan and H. Ogata. Learning analytics platform in higher education in Japan. Journal of Knowledge Management & E-Learning (KM&EL), 10(4):469–484, 2018.
S. Hirokawa and C. Yin. Feature engineering for learning log analysis. Companion Proceedings 9th International Conference on Learning Analytics & Knowledge (LAK19), pages 1–8, 2018.
S. J. Jones and V. M. Long. Learning equity between online and on-Site mathematics courses. Journal of Online Learning & Teaching, 9(1):1–12, 2013.
LAK22DataChallenge. The 4th Workshop on Predicting Performance Based on the Analysis of Reading Behavior, 2022. https://sites.google.com/view/lak22datachallenge.
S. Leelaluk, T. Minematsu, Y. Taniguchi, F. Okubo, and A. Shimada. Predicting student performance based on lecture materials data using neural network models. In Proceedings of the 4th Workshop on Predicting Performance Based on the Analysis of Reading Behavior, 2022.
H. Ogata, M. Oi, K. Mohri, F. Okubo, A. Shimada, M. Yamada, J. Wang, and S. Hirokawa. Learning analytics for e-book-based educational big data in higher education. In Smart Sensors at the IoT Frontier, pages 327–350. Springer, Cham, 2017.
F. Okubo, A. Shimada, T. Yamashita, and H. Ogata. A neural network approach for students’ performance prediction. In Proceedings of ACM International Conference on Learning Analytics & Knowledge (LAK), pages 598–599, 2017.
E. Panadero. A review of self-regulated learning: Six models and four directions for research. Frontiers in Psychology, 8(APR):1–28, 2017.
J. Paul and F. Jefferson. A comparative analysis of student performance in an online vs. face-to-Face environmental science course from 2009 to 2016. Frontiers in Computer Science, 1(November), 2019.
A. Shimada, S. Konomi, and H. Ogata. Real-time learning analytics system for improvement of on-site lectures. Interactive Technology and Smart Education, 15(4):314–331, 2018.
A. Shimada, K. Mouri, Y. Taniguchi, H. Ogata, R. I. Taniguchi, and S. Konomi. Optimizing assignment of students to courses based on learning activity analytics. In Proceedings of the 12th International Conference on Educational Data Mining (EDM), pages 178–187, 2019.
A. Shimada, F. Okubo, C. Yin, and H. Ogata. Automatic summarization of lecture slides for enhanced student preview –Technical report and user study–. IEEE Transactions on Learning Technologies, 11(2):165–178, 2018.
A. Shimada, Y. Taniguchi, F. Okubo, S. Konomi, and H. Ogata. Online change detection for monitoring individual student behavior via clickstream data on e-book system. In Proceedings of ACM International Conference on Learning Analytics & Knowledge (LAK), pages 446–450, 2018.
X. Yang, D. Li, X. Liu, and J. Tan. Learner behaviors in synchronous online prosthodontic education during the 2020 COVID-19 pandemic. Journal of Prosthetic Dentistry, 126(5):653–657, 2021.

[1] R. Aggarwal, N. Gupte, N. Kass, H. Taylor, J. Ali, A. Bhan, A. Aggarwal, S. D. Sisson, S. Kanchanaraksa, J. McKenzie-White, J. McGready, P. Miotti, and R. C. Bollinger. A comparison of online versus on-site training in health research methodology: A randomized study. BMC Medical Education, 11(1):1–10, 2011.

[2] G. Akçapınar, M. N. Hasnine, R. Majumdar, B. Flanagan, and H. Ogata. Developing an early-warning system for spotting at-risk students by using eBook interaction logs. Smart Learning Environments, 6(1), 2019.

[3] M. Amin, B. R. Sinha, and P. P. Dey. Online versus onsite teaching performance analysis of an introductory electrical circuit class. Asian Journal of Education and e-Learning, 9(4), 2021.

[4] B. Flanagan and H. Ogata. Learning analytics platform in higher education in Japan. Journal of Knowledge Management & E-Learning (KM&EL), 10(4):469–484, 2018.

[5] S. Hirokawa and C. Yin. Feature engineering for learning log analysis. Companion Proceedings 9th International Conference on Learning Analytics & Knowledge (LAK19), pages 1–8, 2018.

[6] S. J. Jones and V. M. Long. Learning equity between online and on-Site mathematics courses. Journal of Online Learning & Teaching, 9(1):1–12, 2013.

[7] LAK22DataChallenge. The 4th Workshop on Predicting Performance Based on the Analysis of Reading Behavior, 2022. https://sites.google.com/view/lak22datachallenge.

[8] S. Leelaluk, T. Minematsu, Y. Taniguchi, F. Okubo, and A. Shimada. Predicting student performance based on lecture materials data using neural network models. In Proceedings of the 4th Workshop on Predicting Performance Based on the Analysis of Reading Behavior, 2022.

[9] H. Ogata, M. Oi, K. Mohri, F. Okubo, A. Shimada, M. Yamada, J. Wang, and S. Hirokawa. Learning analytics for e-book-based educational big data in higher education. In Smart Sensors at the IoT Frontier, pages 327–350. Springer, Cham, 2017.

[10] F. Okubo, A. Shimada, T. Yamashita, and H. Ogata. A neural network approach for students’ performance prediction. In Proceedings of ACM International Conference on Learning Analytics & Knowledge (LAK), pages 598–599, 2017.

[11] E. Panadero. A review of self-regulated learning: Six models and four directions for research. Frontiers in Psychology, 8(APR):1–28, 2017.

[12] J. Paul and F. Jefferson. A comparative analysis of student performance in an online vs. face-to-Face environmental science course from 2009 to 2016. Frontiers in Computer Science, 1(November), 2019.

[13] A. Shimada, S. Konomi, and H. Ogata. Real-time learning analytics system for improvement of on-site lectures. Interactive Technology and Smart Education, 15(4):314–331, 2018.

[14] A. Shimada, K. Mouri, Y. Taniguchi, H. Ogata, R. I. Taniguchi, and S. Konomi. Optimizing assignment of students to courses based on learning activity analytics. In Proceedings of the 12th International Conference on Educational Data Mining (EDM), pages 178–187, 2019.

[15] A. Shimada, F. Okubo, C. Yin, and H. Ogata. Automatic summarization of lecture slides for enhanced student preview –Technical report and user study–. IEEE Transactions on Learning Technologies, 11(2):165–178, 2018.

[16] A. Shimada, Y. Taniguchi, F. Okubo, S. Konomi, and H. Ogata. Online change detection for monitoring individual student behavior via clickstream data on e-book system. In Proceedings of ACM International Conference on Learning Analytics & Knowledge (LAK), pages 446–450, 2018.

[17] X. Yang, D. Li, X. Liu, and J. Tan. Learner behaviors in synchronous online prosthodontic education during the 2020 COVID-19 pandemic. Journal of Prosthetic Dentistry, 126(5):653–657, 2021.