This is the repository for Reportedly LLMs, which aims to build task-specific LLMs for medical proofreading.
This repository is temporarily for review purposes; we will release a published version later.
The overall workflow of Reportedly LLMs.
Figure 1 will be presented here after the manuscript is published.
Our work consists of four parts:
(1). Dataset Construction
(2). Model Development
(3). Evaluation
We constructed a dataset consisting of two parts.
The first part includes 1,656 synthetic radiology reports generated by GPT-4 using specified prompts, divided into 828 error-free synthetic reports and 828 synthetic reports with errors.
Please refer to Prompts_for_Synthetic.txt
The second part comprises 614 reports: 307 errorfree reports from the MIMIC-CXR database, and 307 corresponding synthetic reports with errors generated by GPT-4 based on these MIMIC-CXR reports and specified prompts.
Please refer to Prompts_for_MIMIC.txt
We fine-tune our models using Firefly codes.
Please refer to Firefly(https://github.com/yangjianxin1/Firefly)
Llama-3-8B-Instruct and Llama-3-70B-Instruct are fine-tuned on the training set with the following hyperparameters:
Hyperparameter | Llama-3-8B-Instruct | Llama-3-70B-Instruct |
---|---|---|
Batch size | 1 | 1 |
Learning rate | 3e-4 | 3e-4 |
Epochs | 3 | 3 |
Max length | 512 | 512 |
We evaluated the performance of models such as Llama-3 and GPT-4 on the test set.
Please refer to demo.ipynb for the relevant code.
XXX
Please cite the repo if you use the data or code in this repository.
@misc{XXX2024llm,
author = {XXX},
title = {Reportedly LLMs: Generative Large Language Models for Proofreading Errors in Radiology Reports},
year = {2024},
publisher = {XXX},
journal = {XXX},
}
XXX