Abstract: Knowledge Tracing (KT) is a task to model students knowledge based on their coursework interactions within an Interactive Learning System (ILS). Recently, Deep Neural Networks (DNN) showed superb performance over classical methods on multiple dataset benchmarks. While most Deep Neural Network Knowledge Tracing (DNNKT) models are optimized for general objective metrics such as accuracy or AUC on benchmark data, proper deployment of the service requires additional qualities. Moreover, the black-box nature of DNN models makes them particularly difficult to diagnose or improve when an unexpected behavior is encountered. In this context, we adopt the idea of black-box testing / behavioral testing from Software Engineering and (1) define ideal KT model behaviors to (2) propose a KT model assessment framework to measure the model's consistency and robustness. We test three state-of-the-art DNNKT models on seven datasets based on the proposed framework. The result highlights the impact of dataset size and model architecture upon the model's behavioral quality. The assessment results from the proposed framework can be used as an auxiliary measure of the model performance by itself, but can also be utilized in model improvements via data-augmentation, architecture design, and loss formulation.