# ABSTRACT

Knowledge tracing (KT) is an essential task in online education, which dynamically assesses students’ mastery of concepts by predicting the probability that they correctly answer questions. One of the most effective solutions for knowledge tracing is graph-based methods. They maintain multiple vectors to represent students’ mastery of concepts, and use these vectors to predict the probability of students correctly answering questions. To give more accurate predictions, the graph-based methods require concept relations to update these vectors once students answer questions. However, the concept relations usually require manual annotation in a real-world scenario, limiting the application of the graph-based method. In this paper, we proposed a method called Automatical Graph-based Knowledge Tracing (AGKT), which is a graph-based method that updates these vectors without requiring manually annotated concept relations. We evaluate our method on four public datasets and compare it with ten advanced methods. The experiment results demonstrate that AGKT yields superior performance.

# Keywords

# 1. INTRODUCTION

Knowledge tracing is a task that estimates students’ mastery of knowledge by predicting the probability that they correctly answer questions. It plays a significantly important role in the online educational application, like exercise recommendation and knowledge diagnosis. The input of knowledge tracing is the question-answering history of a student and a new question, and the output is the probability of the student correctly answering the question. For instance, in step 4, the knowledge tracing aims to use the student’s performance on ${q}_{1},{q}_{2},{q}_{3}$ to predict the probability of the student correctly answering ${q}_{4}$ in Figure 1.

To address the knowledge tracing problem, many outstanding
methods [11, 17, 8, 19, 7, 1, 5, 3, 10, 18, 12, 9, 14] have
been proposed, which could be grouped into graph-free methods
[11, 17, 8, 19, 7, 1, 5, 3, 10, 18, 12] and graph-based
methods [9, 14]. Graph-free methods are directly built based on
the sequential models, like auto-regressive methods [13, 6, 2],
Transformer [15]. They maintain one or multiple vectors to
represent students’ mastery of knowledge concepts, which
denotes as knowledge states, and they predict students’
performance based on vectors. As students’ knowledge states
change with time, to maintain the latest knowledge states for
prediction, they update the vectors which represent knowledge
states according to the questions and question-concept relation
immediately after students answer questions. However, according
to the previous research, concepts in one specific domain are
correlative with each other [7, 19]. Thus, when a student has a
deeper understanding of one concept, her(his) mastery of the
correlative concepts also changes. For instance, answering
${q}_{1}$
enhances the student’s mastery of *addition *in Figure 1.
Since *addition *is correlative with *multiplication *in the
concept relation graph, the deeper understanding of *addition*
enhances her (his) mastery of *multiplication*. Hence, the
knowledge states are not only influenced by questions and
question-concept relations, but also influenced by the relations
among concepts.

Due to this reason, the graph-based methods introduce the concept-relation graph when they update students’ knowledge states. Though the graph-based methods gain many achievements in performance and interpretability, the concept-relation graph usually relies on manual annotation, which is almost impossible for cases with thousands of concepts. To address this issue and keep the superiority of graph-based methods, we proposed a method called Automatical Graph-based Knowledge Tracing (AGKT), which is independent on the manual-annotated concept relation, but could update students’ knowledge states like graph-based methods by considering question, question-concept relation and concept-concept relation. We perform experiments on four real-world datasets, and compare our method with ten excellent models. The experiment results reveal that our method has superior performance to the other methods.

# 2. PROBLEM FORMULATION

Suppose the learning history of a student is ${\mathcal{\mathcal{X}}}_{t-1}=\{({q}_{1},\phantom{\rule{0.33em}{0ex}}{r}_{1}),({q}_{2},\phantom{\rule{0.33em}{0ex}}{r}_{2}),\phantom{\rule{0.33em}{0ex}}...,\phantom{\rule{0.33em}{0ex}}({q}_{t-1},\phantom{\rule{0.33em}{0ex}}{r}_{t-1})\}$. Here, ${q}_{i}$ denotes the question the student answers at step $i$. ${r}_{i}$ denotes the correctness of the student’s response on ${q}_{i}$, and

$${\text{r}}_{i}=\{\begin{array}{ccc}\hfill 1& ,\phantom{\rule{0.33em}{0ex}}\text{ifthestudent\u2019sanswerisright;}\hfill & \hfill \\ \hfill 0& ,\phantom{\rule{0.33em}{0ex}}\text{otherwise}.\hfill \end{array}$$(1)Each question is related to one or multiple concepts. We denote the set of concepts as $C={\{{c}_{j}\}}_{j=1}^{\left|C\right|}$, and the set of the concepts which are related to ${q}_{i}$ as ${C}_{i}$, and the set of the concepts which are unrelated to ${q}_{i}$ is denoted as ${D}_{i}$. Obviously, ${C}_{i}\cup {D}_{i}=C$.

The task of knowledge tracing is formulated as estimating the
probability that the student correctly answer a new question
${q}_{t}$
given the question-answering history
${\mathcal{\mathcal{X}}}_{t-1}$, *i*.*e*.,
$P({r}_{t}=1|{q}_{t},{\mathcal{\mathcal{X}}}_{t-1})$.
We approach that by learning a function to estimate the
probability:

where ${\overline{r}}_{t}=P({r}_{t}=1|{q}_{t},{\mathcal{\mathcal{X}}}_{t-1})$ and the input of ${f}_{\Theta}(\cdot )$ represent the features used to predict the correctness of the student.

# 3. METHOD

As we discussed previously, students’ latest knowledge states are necessary to predict the probability of students correctly answering questions. Moreover, considering the question, the question-concept relation and concept-concept relation in knowledge state update benefits for the performance. Nevertheless, annotating the concept-concept relation is almost impossible. To reserve the superiority of the graph-based methods in the condition of manual annotated concept-concept relation absent, we proposed a method called Automatical Graph-based Knowledge Tracing (AGKT). It is composed of three components, as Figure 2 (a) illustrates: Automatical Graph (AG), state update module and prediction module. AG is obtained according to question-concept relation, which replaces the concept relation graph to assist the state update module in updating students’ knowledge states. The prediction module predicts the probability of students correctly answering questions by students’ knowledge states and the question information. In the following, we will discuss the AG first. Then, we will present how to update students’ knowledge states based on AG in the state update module. Finally, we will discuss the prediction module.

## 3.1 Automatical Graph (AG)

Each question corresponds to an Automatical Graph (AG). Figure 2(b) illustrates the corresponding AG for ${q}_{1}$ in Figure 1. We denote the graph corresponds to ${q}_{t}$ as ${\mathcal{\mathcal{G}}}_{t}$, and ${\mathcal{\mathcal{G}}}_{t}=\{{\mathcal{\mathcal{V}}}_{t},\phantom{\rule{0.33em}{0ex}}{\mathcal{\mathcal{E}}}_{t}\}$. ${\mathcal{\mathcal{V}}}_{t}$ is the node set, and ${\mathcal{\mathcal{V}}}_{t}=\{{V}_{t}^{c},\phantom{\rule{0.33em}{0ex}}{V}_{t}^{d},\phantom{\rule{0.33em}{0ex}}s\}$. Here, each node in ${V}_{t}^{c}$ represents a concept in ${C}_{t}$. That is, each node in ${V}_{t}^{c}$ represents one question-related concept. Each node in ${V}_{t}^{d}$ represents a question-unrelated concept (${D}_{t}$). For the ease of presentation, we call these nodes by concepts when there is no ambiguity. $s$ is the supernode, which connects the question-related concepts and question-unrelated concepts. ${\mathcal{\mathcal{E}}}_{t}$ represents the edges on ${\mathcal{\mathcal{G}}}_{t}$, and ${\mathcal{\mathcal{E}}}_{t}=\{{E}_{c},\phantom{\rule{0.33em}{0ex}}{E}_{d}\}$, ${E}_{c}$ represents the edges between ${V}_{t}^{c}$ and $s$, and ${E}_{d}$ represents the edges between ${V}_{t}^{d}$ and $s$.

## 3.2 The state update module

Since the questions require students to utilize their knowledge of question-related concepts to answer, which increases their comprehension of the question-related concepts, and causes their knowledge states on these concepts to change once they answer questions. As concepts are correlative, these changes will also influence the knowledge states of some question-unrelated concepts. Based on this, we maintain a vector for each concept to represent a student’s knowledge state on it. When a student answer a question, we update the knowledge states of question-related concepts first, and then we update the knowledge states of question-unrelated concepts according to the AG.

### 3.2.1 The state update of question-related concepts

Suppose a student interacts with question ${q}_{t}$ at step $t$. After the student answer ${q}_{t}$,we update the knowledge states of question-related concepts by concepts’ attributes and the student’s response on ${q}_{t}$. Specifically, for a concept ${c}_{i}\in {C}_{t}$, we integrate its embedding and the student’s correctness by:

$${\text{z}}_{t}^{i}=\{\begin{array}{c}\hfill {\mathbf{\text{e}}}_{i}^{c}\phantom{\rule{0.33em}{0ex}}\oplus \phantom{\rule{0.33em}{0ex}}0,\phantom{\rule{0.33em}{0ex}}{r}_{t}=1,\\ \hfill 0\phantom{\rule{0.33em}{0ex}}\oplus \phantom{\rule{0.33em}{0ex}}{\mathbf{\text{e}}}_{i}^{c},\phantom{\rule{0.33em}{0ex}}{r}_{t}=0,\end{array}$$(3)where $0=(0,0,...,0)$, is a $d$ dimension zero vector. ${\mathbf{\text{e}}}_{i}^{c}\in {\mathbf{\text{R}}}^{d}$ denotes the embedding of ${c}_{i}$. $\text{\u2295}$ denotes the concatenation. Then, we feed the concatenation into the cell of recurrent neural network (RNN) to update the knowledge state of ${c}_{i}$:

$${\text{h}}_{t+1}^{i}=\mathit{GRU}({\mathbf{\text{z}}}_{t}^{i},{\mathbf{\text{h}}}_{t}^{i}),$$(4)where ${\mathbf{\text{h}}}_{t}^{i}$ denotes the knwoledge states on concept ${c}_{i}$ at step t, it is the $i$-th row of the knowledge states matrix ${\mathbf{\text{H}}}_{t}$. GRU denotes the gated recurrent unit [2].

### 3.2.2 The state update of question-unrelated concepts.

The knowledge states of one concept will influence the concepts
that correlate to it, *e*.*g*., the knowledge state of *multiplication *is
influenced by the knowledge state of *addition *after the student
answer ${q}_{1}$
in Figure 1. Thus, the update of question-unrelated concepts
should be based on the relation of concepts. Since the concept
relation is unknown in many cases, we update the knowledge
states of question-unrelated concepts based on AG (Figure 2
(b)). Since the question-related concepts connect to the
supernode, and the supernode connects the question-unrelated
concepts in AG, the supernode integrates the message about the
knowledge states of all the question-related concepts first. Then
it transmits the integrated message to question-unrelated
concepts. Finally, the question-unrelated concepts update their
knowledge states according to the message transmitted
from supernode. Specifically, we take the following steps to
update the knowledge states of the question-unrelated
concepts:

First, we represent the original message that a question-related concept ${c}_{i}$ sent to the supernode by:

$${\text{m}}_{t}^{i}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}{\mathbf{\text{h}}}_{t+1}^{i}\phantom{\rule{0.33em}{0ex}}\oplus \phantom{\rule{0.33em}{0ex}}{\mathbf{\text{o}}}_{i},$$(5)where ${\mathbf{\text{o}}}_{i}$ is the one-hot encoding of concept ${c}_{i}$ to denote the identity of the message source. The supernode aggregates the messages from the question-related concepts by:

$${\text{h}}^{s}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}\frac{1}{\left|{C}_{t}\right|}\phantom{\rule{0.33em}{0ex}}\sum _{{c}_{i}\in {C}_{t}}{f}_{s}({\mathbf{\text{m}}}_{t}^{i}),$$(6)where ${f}_{s}$ denotes multi-layer perceptrons (MLP). ${\mathbf{\text{h}}}^{s}$ denotes the states of the supernode.

Then, the supernode transmits the message it receives to the question-unrelated concepts. We represent the message obtained by the question-unrelated concept ${c}_{j}$ (${c}_{j}\in {D}_{t}$) as

$${\text{m}}_{t}^{j}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}{f}_{d,j}({\mathbf{\text{h}}}_{s}),$$(7)${f}_{d,j}$
denotes the mapping function, which is implemented by
MLP, to extract the message which is interested by
${c}_{j}$,
*e*.*g*., the knowledge states of its correlative concepts. Note,
for different question-unrelated concepts, the mapping
functions in Eq. 7 are different. However, for the same
question-unrelated concept in different time steps, the mapping
function is shared to guarantee the pattern are compatible in all
steps.

Finally, we update the knowledge state on question-unrelated concept ${c}_{j}$ by:

$${\text{h}}_{t+1}^{j}=\mathit{GRU}({\mathbf{\text{m}}}_{t}^{j},{\mathbf{\text{h}}}_{t}^{j}).$$(8)Dataset | ASSIST09 | ASSIST12 | EdNet | Junyi Math |
---|---|---|---|---|

Students | 2,968 | 22,422 | 50,000 | 1,146 |

Records | 185,110 | 1,839,429 | 3,266,010 | 101,854 |

Questions | 15,003 | 45,543 | 12,077 | 1,145 |

Concepts | 121 | 99 | 189 | 39 |

Questions Per Concept | 150.76 | 460.03 | 144.01 | 36.28 |

Concepts Per Question | 1.22 | 1.0 | 2.67 | 1.0 |

Attempts Per Question | 12.34 | 40.39 | 270.43 | 71.98 |

Attempts Per Concept | 1,914.21 | 18,580.10 | 39,959.14 | 2,611.64 |

Positive Label Rates | 63.80% | 69.60% | 59.54% | 66.78% |

## 3.3 The prediction module

To predict the probability of students correctly answering questions, we consider the attributes of question ${q}_{t}$, the level of the student master question-related concepts, and her(his) knowledge background. We represent the attributes of question ${q}_{t}$ by the question embedding ${\mathbf{\text{e}}}_{t}^{q}$, and ${\mathbf{\text{e}}}_{t}^{q}\in {\mathbf{\text{R}}}^{d}$. We represent the level of the student master the question-related concepts by the mean knowledge state:

$${\text{h}}_{m}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}\frac{1}{\left|{C}_{t}\right|}\phantom{\rule{0.33em}{0ex}}\sum _{{c}_{i}\in {C}_{t}}{\mathbf{\text{h}}}_{t}^{i}.$$(9)We represent students’ knowledge background as

$${\text{h}}_{c}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}\frac{1}{\left|C\right|}\phantom{\rule{0.33em}{0ex}}\sum _{{c}_{i}\in C}{\mathbf{\text{h}}}_{t}^{i},$$(10)and then we predict the probability of the student correctly answering ${q}_{t}$ by

$${\text{r\xaf}}_{t}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}\delta ({f}_{r}({\mathbf{\text{h}}}_{c}\oplus {\mathbf{\text{e}}}_{t}^{q}\phantom{\rule{0.33em}{0ex}}\oplus \phantom{\rule{0.33em}{0ex}}{\mathbf{\text{h}}}_{m}),$$(11)where ${f}_{r}$ denotes MLP, and $\delta $ denotes the Sigmoid function.

## 3.4 Model Learning

The objective function of our model is to minimize the negative log-likelihood of the observed sequence. The sequence is the question-answering history of the student from step 1 to $T$. The learning parameters of our method are the embedding of concepts and questions, the weights in GRU, the parameters of all the MLPs. The parameters are jointly learned by minimizing the cross-entropy between the predicted probability ${\overline{r}}_{t}$ and the students’ actual correctness ${r}_{t}$ as

$$\mathcal{L}\phantom{\rule{0.33em}{0ex}}=\phantom{\rule{0.33em}{0ex}}-\sum _{i=1}^{T}({r}_{i}\mathit{log}\phantom{\rule{0.33em}{0ex}}{\overline{r}}_{i}\phantom{\rule{0.33em}{0ex}}+\phantom{\rule{0.33em}{0ex}}(1-{r}_{i})\mathit{log}(1-{\overline{r}}_{i})).$$(12)

# 4. EXPERIMENT

## 4.1 Dataset

We evaluate our method on four public datasets: ASSIST09, ASSIST12, EdNet, and Junyi Math. These datasets record the question-answering history of students. We take the questions, the concepts related to the questions, and the students correctness of responses from the records. The maximum length of students’ question-answering history is set to 200. We split 80% data for training and validation, and 20% for testing. The statistics of the four datasets are shown in Table 1. Note, the statistics are the actual samples we use in our experiments after preprocessing, which are different from the statistics of raw data.

Model | ASSIST09 | ASSIST12 | EdNet | Junyi Math |
---|---|---|---|---|

BKT | 0.6815 | 0.6142 | 0.5729 | 0.6293 |

KTM | 0.6734 | 0.6881 | 0.7071 | 0.7207 |

SAKT | 0.6884 | 0.6914 | 0.7313 | 0.7422 |

SAINT | 0.6901 | 0.6917 | 0.7336 | 0.8079 |

AKT-NR | 0.6940 | 0.7098 | 0.7450 | 0.7705 |

DKT | 0.6769 | 0.6884 | 0.7496 | 0.8397 |

DHKT | 0.7499 | 0.6966 | 0.7547 | 0.8594 |

EERNNA | 0.7244 | 0.7000 | 0.7437 | 0.8196 |

DKVMN | 0.7235 | 0.6664 | 0.7009 | 0.8426 |

GKT | 0.7488 | 0.6857 | 0.7039 | 0.8257 |

AGKT | 0.7765 | 0.7232 | 0.7589 | 0.8656 |

## 4.2 Baselines

To evaluate the effectiveness of our model, we compare our
method with graph-free methods, and graph-based methods.
The graph-free methods trace students’ knowledge states
without considering concept-relation, and they could be grouped
into four groups. The first group are traditional methods, *i*.*e*.,
BKT [4], KTM [16], which trace the students’ knowledge states
according to the factors that affect students’ learning. The
second group are single-state methods, *i*.*e*., DKT [11],
DHKT [17], EERNNA [7], which use one vector to represent
students’ knowledge states on all concepts. The third group is
multi-state methods, *i*.*e*., DKVMN [19], which maintain a
vector for each concept to represent students’ knowledge states
on them. The fourth group are state-free methods, *i*.*e*.,
SAKT [10], SAINT [3], AKT-NR [5], which maintain no
explicit vector to represent students’ knowledge states. The
graph-based methods apply a concept-relation graph in
the knowledge state update. Here we choose the GKT
[9].

## 4.3 Student Response Prediction

We measure the AUC to evaluate the performance of models. A higher AUC indicates a better performance. Table 2 shows the converged ACC, AUC. According to Table 2, on the ASSIST09, our method is better than the best baseline DHKT by 2.66% in AUC. On ASSIST12, our model outperforms the best baseline AKT-NR 1.34% in AUC. On EdNet, our model is better than the best baseline DHKT by 0.42% in AUC. On Junyi, our method is better than the best baseline DHKT 0.64% in AUC. Thus, the performance presented in Table 2 demonstrates that our method is effective.

## 4.4 Ablation Study

To further verify the contribution of AG, we conduct experiments on ASSIST09 with three comparative settings:

**MovGraph**removes the AG from AGKT. Thus, this setting removes the concept-concept relation.**DenGraph**replaces the AG in AGKT with the dense graph in GKT [9]**TransGraph**replaces AG with the transition graph in GKT.

The experimental results are shown in Figure 3. We can find
that: (1) *AGKT*, *DenGraph *and *TransGraph *have better
performance than *MovGraph*. That means the consideration of
concept-concept relation is beneficial; (2) Replacing AG in
AGKT with other graphs (DenGraph, TransGraph) decreases
the performance. That means the AG is more effective in the
framework of AGKT.

## 4.5 Case Study

To investigate whether the states update of question-unrelated concepts is interpretable, we randomly pick two question-answering records, and visualize the state of supernode and the message obtained by question-unrelated concepts. We expect the knowledge states of question-unrelated concepts could be automatically updated according to the actual relationship with the question-related concepts when AGKT converges. Thus, for the correlative concepts of the question-related concepts, the message they receive from supernode should be correlated with the state of the supernode.

The results is presented in Figure 4. We can observe the
message obtained by *triangle properties *(No.21) is highly close
to supernode when the student answer the question related to
*congruent triangles *(No.9), and message obtained by *rates and
ratios *is highly close to the state of supernode when the student
answer the question related to *ratio-percentage *(No.29). Thus,
the state update of question-unrelated concepts is interpretable
in our method. which demonstrates our method has good
interpretability.

# 5. CONCLUSION

In this paper, we proposed a method called Automatical Graph-based Knowledge Tracing (AGKT). Different from the previous graph-based methods, it adopts the Automatical Graph (AG) to automatically update students’ knowledge state without requiring the manually annotated concept relation. We evaluate the performance of our method on four public real-world datasets, and compare it with ten methods. The experiment result reveals that our method is effective in tracing student’s knowledge states.

# 6. ACKNOWLEDGMENTS

The corresponding author Yong Yu is supported by National Natural Science Foundation of China (62177033). The work is also sponsored by Huawei Innovation Research Program.

# References

- G. Abdelrahman and Q. Wang. Knowledge tracing
with sequential
key-value memory networks. In
*Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval*, pages 175–184, 2019. - K. Cho, B. Van Merriënboer, D. Bahdanau, and
Y. Bengio. On the properties of neural machine
translation: Encoder-decoder approaches.
*arXiv preprint arXiv:1409.1259*, 2014. - Y. Choi, Y. Lee, J. Cho, J. Baek, B. Kim, Y. Cha,
D. Shin, C. Bae, and J. Heo. Towards an appropriate
query, key, and value computation for knowledge
tracing. In
*Proceedings of the Seventh ACM Conference on Learning@ Scale*, pages 341–344, 2020. - A. T. Corbett and J. R. Anderson. Knowledge tracing:
Modeling the acquisition of procedural knowledge.
*User modeling and user-adapted interaction*, 4(4):253–278, 1994. - A. Ghosh, N. Heffernan, and
A. S. Lan. Context-aware attentive knowledge tracing.
In
*Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining*, pages 2330–2339, 2020. - S. Hochreiter and J. Schmidhuber. Long short-term
memory.
*Neural computation*, 9(8):1735–1780, 1997. - Q. Liu, Z. Huang, Y. Yin, E. Chen, H. Xiong, Y. Su,
and G. Hu. Ekt: Exercise-aware knowledge tracing for
student performance prediction.
*IEEE Transactions on Knowledge and Data Engineering*, 33(1):100–115, 2019. - K. Nagatani, Q. Zhang, M. Sato,
Y.-Y. Chen, F. Chen, and T. Ohkuma. Augmenting
knowledge tracing by considering forgetting behavior.
In
*The World Wide Web Conference*, pages 3101–3107, 2019. - H. Nakagawa, Y. Iwasawa, and
Y. Matsuo. Graph-based knowledge tracing: Modeling
student proficiency using graph neural network. In
*2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI)*, pages 156–163. IEEE, 2019. - S. Pandey and G. Karypis. A self-attentive model for
knowledge tracing.
*arXiv preprint arXiv:1907.06837*, 2019. - C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl-Dickstein. Deep knowledge tracing. volume 28, pages 505–513, 2015.
- S. Shen,
Q. Liu, E. Chen, H. Wu, Z. Huang, W. Zhao, Y. Su,
H. Ma, and S. Wang. Convolutional knowledge
tracing: Modeling individualization in student learning
process. In
*Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval*, pages 1857–1860, 2020. - S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus.
End-to-end memory networks.
*arXiv preprint arXiv:1503.08895*, 2015. - S. Tong, Q. Liu, W. Huang, Z. Huang, E. Chen,
C. Liu, H. Ma, and S. Wang. Structure-based
knowledge tracing: An influence propagation view. In
*2020 IEEE International Conference on Data Mining (ICDM)*, pages 541–550. IEEE, 2020. - A. Vaswani,
N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N.
Gomez, L. Kaiser, and I. Polosukhin. Attention is all
you need.
*arXiv preprint arXiv:1706.03762*, 2017. - J.-J. Vie and H. Kashima. Knowledge
tracing machines: Factorization machines for knowledge
tracing. In
*Proceedings of the AAAI Conference on Artificial Intelligence*, volume 33, pages 750–757, 2019. - T. Wang, F. Ma, and J. Gao. Deep hierarchical
knowledge tracing. In
*12th International Conference on Educational Data Mining, EDM 2019*, pages 671–674. International Educational Data Mining Society, 2019. - S. Yang, M. Zhu, J. Hou, and X. Lu. Deep
knowledge tracing with convolutions.
*arXiv preprint arXiv:2008.01169*, 2020. - J. Zhang, X. Shi, I. King, and D.-Y. Yeung. Dynamic
key-value memory networks for knowledge tracing. In
*Proceedings of the 26th international conference on World Wide Web*, pages 765–774, 2017.

© 2022 Copyright is held by the author(s). This work is distributed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.