This repository contains the code for the paper Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs. The primary contributions here include code for 1) our language model-based LLMKT and DKT-Sem models, 2) running DKT family and BKT models on dialogue knowledge tracing, and 3) automatically annotating dialogues with knowledge component and correctness labels using the OpenAI API.
If you use our code or find this work useful in your research then please cite us!
@inproceedings{scarlatos2024exploringknowledgetracingtutorstudent,
title={Exploring Knowledge Tracing in Tutor-Student Dialogues using LLMs},
author={Alexander Scarlatos and Ryan S. Baker and Andrew Lan},
year={2025},
booktitle={Proceedings of the 15th Learning Analytics and Knowledge Conference, {LAK} 2025, Dublin, Ireland, March 3-7, 2025},
publisher={{ACM}},
}
Annotated versions of the CoMTA and MathDial datasets (i.e. including per-turn knowledge component and correctness labels) are available in data/annotated
, and can be loaded as-is during knowledge tracing training.
These versions of the datasets are subject to their original licenses. The license for CoMTA is available in data/annotated/COMTA_LICENSE.txt
and MathDial is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.
This step is not necessary to reproduce our knowledge tracing results since we release the annotated data in data/annotated
. However, you can follow the steps below to replicate our workflow or to experiment with custom data annotation.
Achieve the Core (ATC): Download the ATC HuggingFace dataset and put standards.jsonl
and domain_groups.json
under data/src/ATC/
. At the time of releasing this code, the data was not accessible via HuggingFace due to a bug. If the data is still not accessible then you can contact us or the authors of the paper to send you a copy.
CoMTA: Download the CoMTA data file and put it under data/src
.
MathDial: Clone the MathDial repo and put the root under data/src
.
We used Python 3.10.12 in the development of this work. Run the following to set up a Python environment:
python -m venv dk
source dk/bin/activate
pip install -r requirements.txt
Also add the following to your environment:
export OPENAI_API_KEY=<your key here> # For automated annotation via OpenAI
export CUBLAS_WORKSPACE_CONFIG=:4096:8 # For enabling deterministic operations
This step is not necessary to reproduce our results because we release the annotated datasets, but is here for reference.
Dialogue KT requires each dialogue turn to be annotated with correctness and knowledge component (KC) labels. We automate this process with LLM prompting via the OpenAI API. You can run the following to tag correctness and ATC standard KCs on the two datasets:
python main.py annotate --mode collect --openai_model gpt-4o --dataset comta
python main.py annotate --mode collect --openai_model gpt-4o --dataset mathdial
To see statistics on the resulting labels, run:
python main.py annotate --mode analyze --dataset comta
python main.py annotate --mode analyze --dataset mathdial
Each of the following runs a train/test cross-validation on the CoMTA data for a different model:
python main.py train --dataset comta --crossval --model_type lmkt --model_name lmkt_comta # LLMKT
python main.py train --dataset comta --crossval --model_type dkt-sem --model_name dkt-sem_comta # DKT-Sem
python main.py train --dataset comta --crossval --model_type dkt --model_name dkt_comta # DKT
python main.py train --dataset comta --crossval --model_type dkvmn --model_name dkvmn_comta # DKVMN
python main.py train --dataset comta --crossval --model_type akt --model_name akt_comta # AKT
python main.py train --dataset comta --crossval --model_type saint --model_name saint_comta # SAINT
python main.py train --dataset comta --crossval --model_type simplekt --model_name simplekt_comta # simpleKT
python main.py train --dataset comta --crossval --model_type bkt # BKT
Check the results
folder for metric summaries and turn-level predictions for analysis.
To see all training options, run:
python main.py train --help
We run a grid search to find the optimal hyperparameters for the DKT family models. For example, to run a search for DKT on CoMTA, run the following (crossval is inferred and model_name is set automatically):
python main.py train --dataset comta --hyperparam_sweep --model_type dkt
The output will indicate the model that achieved the highest validation AUC. To get its performance on the test folds, run:
python main.py test --dataset comta --crossval --model_type dkt --model_name <copy from output> --emb_size <get from model_name>
CoMTA:
- DKT-Sem: lr=2e-4, emb_size=256
- DKT: lr=1e-3, emb_size=32
- DKVMN: lr=1e-4, emb_size=16
- AKT: lr=5e-3, emb_size=32
- SAINT: lr=1e-3, emb_size=32
- simpleKT: lr=2e-4, emb_size=16
MathDial:
- DKT-Sem: lr=2e-3, emb_size=512
- DKT: lr=5e-3, emb_size=256
- DKVMN: lr=1e-3, emb_size=128
- AKT: lr=2e-4, emb_size=64
- SAINT: lr=2e-4, emb_size=64
- simpleKT: lr=5e-4, emb_size=256
To generate the learning curve graphs, run the following (they will be placed in results
):
python main.py visualize --dataset comta --model_name <trained model to visualize predictions for>