Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metadata #1

Open
legaltextai opened this issue Aug 29, 2024 · 1 comment
Open

metadata #1

legaltextai opened this issue Aug 29, 2024 · 1 comment

Comments

@legaltextai
Copy link

Thank you for making this research available on github.

"Our design exploits the fact that while LLMs are known to have been trained on the raw text of American case law, which is in the public domain (Henderson et al. 2022), they have likely not been trained on these cases’ attendant metadata, which exist separately from the cases’ textual content and which we have aggregated from disparate sources.
These metadata enable us to construct reference-based queries for the first nine of our tasks (Table 2)."

Do I understand correctly that the authors knew these models have likely not been trained on metadata, but still proceeded to evaluate these base old models on the knowledge of such data?

What am I missing?

@mattdahl
Copy link
Member

Yes, that's right. We assumed that the foundation models we tested were not trained on tabular case metadata (but were trained on the corpus of American case law itself), meaning that when a model provided a correct answer, it was evidence of its emergent knowledge/reasoning ability and not simply memorization. You've probably also seen our other paper where we look at some RAG systems that do provide some of this metadata directly to the LLM (https://arxiv.org/abs/2405.20362). Feel free to message me on the FLP Slack if you want to talk more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants