#This is the place to publish the models and code involved in our paper ‘Rethinking Unlearning in Law Large Language Model: Datasets, Methods’.
Our paper is currently under review in the journal.
We promise to make the pre-trained corpus (in jsonl format) and the incremental pre-trained large language model (based on Yi-6B base) publicly available at huggingface after the article is accepted.
We will also make public on huggingface three forgetting datasets (legal text Df_law, judicial interpretation Df_int, judicial judgement Df_jud, all in parquent format) and one forgetting test set (Law300, in parquent format).
We will also disclose the preprocessing code, incremental pre-training code and process, forgetting process, forgetting test process, etc. in this GitHub interface.
Thank you for your attention!