You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for making this research available on github.
"Our design exploits the fact that while LLMs are known to have been trained on the raw text of American case law, which is in the public domain (Henderson et al. 2022), they have likely not been trained on these cases’ attendant metadata, which exist separately from the cases’ textual content and which we have aggregated from disparate sources.
These metadata enable us to construct reference-based queries for the first nine of our tasks (Table 2)."
Do I understand correctly that the authors knew these models have likely not been trained on metadata, but still proceeded to evaluate these base old models on the knowledge of such data?
What am I missing?
The text was updated successfully, but these errors were encountered:
Yes, that's right. We assumed that the foundation models we tested were not trained on tabular case metadata (but were trained on the corpus of American case law itself), meaning that when a model provided a correct answer, it was evidence of its emergent knowledge/reasoning ability and not simply memorization. You've probably also seen our other paper where we look at some RAG systems that do provide some of this metadata directly to the LLM (https://arxiv.org/abs/2405.20362). Feel free to message me on the FLP Slack if you want to talk more.
Thank you for making this research available on github.
"Our design exploits the fact that while LLMs are known to have been trained on the raw text of American case law, which is in the public domain (Henderson et al. 2022), they have likely not been trained on these cases’ attendant metadata, which exist separately from the cases’ textual content and which we have aggregated from disparate sources.
These metadata enable us to construct reference-based queries for the first nine of our tasks (Table 2)."
Do I understand correctly that the authors knew these models have likely not been trained on metadata, but still proceeded to evaluate these base old models on the knowledge of such data?
What am I missing?
The text was updated successfully, but these errors were encountered: