Identifying Hallucinations in LLMs

Setup

Set up the conda env by running setup.sh It brings in basic plotting packages as well as captum, which is needed for collecting the token attributions.

Data sources

An overview of the datasets/models used can be found in the paper under the section 4 Experimental setup section of the paper. In particular, while our result_collector.py uses TriviaQA directly, for TREX we do/save a sampling in the form of founders/capitals/place_of_birth.csv. Run trex_parser.py to create these data files.

Artifact data collection

Classifiers and plots will be created on model/derived artifacts like activations, attention, softmax output, attributions. Artifact data collection is done in result_collector.py, is VERY time consuming and best done on a powerful machine. It will write picke files and it gathers more data than used in the paper (in the paper we look at last layer activations, etc). Once acquired however, the same data can be used for a broader analysis if so desired.

We use models/tokenizers from Huggingface. Softmax/logits are collected directly from the model, attributions are collected using the integrated gradients (IG) method available in Captum and activations and attentions (model internal states) are collected using the register_forward_hook functionality.

Plots

Data analysis (the plots in the paper) is done in plots_tsne.ipynb and plots_entropy_and_pca.ipynb. It corresponds to the 5.1 Qualitative analysis section of the paper, however most plots are collected in the appendix.

Once data is collected, we are iterested in comparative plots of softmax/IG attributions/activations across the models and datasets. This is the reason why we collect the large dicts at the beginning of both notebooks. This is also a time consuming process, but note that the notebook(s) can also be used on one model/dataset for fast experimentation. Example: the data source directoiry (in our case results) would contain only capitals/falcon-40b_capitals_7_18.pickle while founders, trivia, place_of_birth stay empty.

Classifiers

We train classifiers on IG, softmax, attention scores, FCC activativations across the models/datasets. The results are in tables 2 and 3 in the Results section of the paper. classifier_model.ipynb creates basic models and trains them on the data collected by result_collector.py.

SelfCheckGPT

We try to use selfcheckgpt and compare to our results; a notebook is included. SelfcheckGPT does not perform well with our models; we hypothesize that this is because the models we use are small and the output for nonzero temperature is often subpar. We use the bert-score and n-gram methods from the selfcheckgpt paper in self_check_gpt.ipynb and we report the results in the appendix B (additional results) of the paper.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
classifier_model.ipynb		classifier_model.ipynb
plots_entropy_and_pca.ipynb		plots_entropy_and_pca.ipynb
plots_tsne.ipynb		plots_tsne.ipynb
result_collector.py		result_collector.py
self_check_gpt.ipynb		self_check_gpt.ipynb
setup.sh		setup.sh
trex_parser.py		trex_parser.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Identifying Hallucinations in LLMs

Setup

Data sources

Artifact data collection

Plots

Classifiers

SelfCheckGPT

About

Releases

Packages

Contributors 2

Languages

License

amazon-science/llm-hallucinations-factual-qa

Folders and files

Latest commit

History

Repository files navigation

Identifying Hallucinations in LLMs

Setup

Data sources

Artifact data collection

Plots

Classifiers

SelfCheckGPT

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages