diff --git a/_data/authors.yml b/_data/authors.yml index f67019e663..227cfe0286 100644 --- a/_data/authors.yml +++ b/_data/authors.yml @@ -59,6 +59,7 @@ active-authors: - jarnstein - jbeckles - jbickleywallace + - jcarstairs - jcwright - jdunleavy - jgrant @@ -502,6 +503,10 @@ authors: picture: picture.jpg jcardy: name: "Jonathan Cardy" + jcarstairs: + name: "Joe Carstairs" + author-summary: "
Developer at Scott Logic. Find me on joeac.net.
" + picture: picture.webp jcwright: name: "Jay Wright" author-summary: "I'm a developer at Scott Logic, based in Newcastle. My primary interest is in backend development, primarily in Java, and cloud technologies. I also have experience building web applications in React.
\n
Away from work I'm a keen petrol head and avid Formula One fan. I support McLaren, in case you were wondering.
" diff --git a/_posts/2024-08-29-llms-dont-hallucinate.md b/_posts/2024-08-29-llms-dont-hallucinate.md new file mode 100644 index 0000000000..7575af5038 --- /dev/null +++ b/_posts/2024-08-29-llms-dont-hallucinate.md @@ -0,0 +1,458 @@ +--- +author: jcarstairs +title: LLMs don't 'hallucinate' +date: 2024-08-28 00:00:00 Z +summary: Describing LLMs as 'hallucinating' fundamentally distorts how LLMs work. We can do better. +category: AI +image: "/jcarstairs/assets/mirage.jpg" +layout: default_post +--- + +It's well known that LLMs make stuff up. + +LLMs can confidently tell you all about +[the winners of non-existent sporting events](https://medium.com/@gcentulani/understanding-hallucination-in-llms-causes-consequences-and-mitigation-strategies-b5e1d0268069). +They can [invent legal cases](https://www.bbc.co.uk/news/world-us-canada-65735769), +and [fabricate fake science](https://www.technologyreview.com/2022/11/18/1063487/meta-large-language-model-ai-only-survived-three-days-gpt-3-science/). +These kinds of outputs are often called 'hallucinations'. + ++ +
+ +Thankfully, the greatest minds of the human race are hard at work ironing out +this problem. Solutions abound: +[improving the quality of your training data](https://medium.com/@gcentulani/understanding-hallucination-in-llms-causes-consequences-and-mitigation-strategies-b5e1d0268069), +[testing the system diligently](https://www.ibm.com/topics/ai-hallucinations), +[fine-tuning the model with human feedback](https://openai.com/index/gpt-4/), +[integrating external data sources](https://arxiv.org/pdf/2104.07567), +or even just +[asking the LLM to evaluate its own answer](https://aclanthology.org/2023.findings-emnlp.123.pdf) +are just some of the ingeneous techniques engineers and scientists have +proposed. + +According to OpenAI, +[GPT-4 reduces hallucination by 40%](https://openai.com/index/gpt-4) compared +to its predecessors: and surely, LLM proponents say, that pattern will continue, +Moore's-Law-style, until LLM hallucination is a distant memory, like the +[inefficiencies in early steam engines](https://www.thehistorycorner.org/articles-by-the-team/a-dream-too-lofty-why-the-steam-engine-failed-in-the-18th-century). + +But is it so simple? A growing chorus of academics, engineers and journalists +are calling this narrative in this question.[^3] Maybe hallucination isn't a +solvable problem at all. + ++ +
+ +Indeed, it's increasingly clear that the word 'hallucination' fundamentally +distorts the way LLMs actually work. This means our judgements of what LLMs are +capable of could be systematically off the mark. + +I think we need to do better. What we need is **a new word**, one which captures +how LLMs behave without baking in false assumptions about how they work. But +before we get to the solution, we need to properly understand the problem. + +## What is 'hallucination'? + +Since about 2017, researchers in natural language processing have used the word +'hallucination' to describe a variety of behaviours. Since ChatGPT propelled +LLMs to public awareness, this word has become commonplace. But what does it +actually mean?[^1] + +Sometimes, it refers to **unfaithfulness**, a jargon word in natural language +processing. This means that its output contains information which is not +epistemically supported[^6] by the input. + +For example, if you try to get an LLM to summarise a meeting transcript, and its +summary includes events which weren't described in the transcript, the LLM has +been 'unfaithful'. + +When academics talk about 'hallucination', they most often mean something like +'unfaithfulness' - though other meanings are common, too. + +Sometimes, 'hallucination' refers **unfactuality**: producing outputs which +contain falsehoods, or other falsehood-like things, such as epistemically +unjustifiable claims, or claims which contradict the LLM's training data. This +is probably the most common usage in non-academic circles. + +By the way, sometimes, 'hallucination' has another meaning altogether: +**nonsense**. Nonsense is one of those things which is hard to define, but you +know it when you see it. This isn't my focus for this article, so let's leave +nonsense aside for proper treatment another day. + +So what people mean by 'hallucination' is pretty diverse. But all of these +different usages have something in common: 'hallucination' is regarded as +**abnormal behaviour**. + +### 'Hallucination' as abnormal behaviour: what this means + +Treating hallucination as abnormal is very significant. Let's take a minute to +consider what this entails. + +First of all, it entails that **every hallucination can be explained**, at least +in principle. + +This is because, when the environment and inputs are normal, systems behave +normally. So, whenever an LLM hallucinates, we should expect to find something +abnormal about the environment or the input which explains why the LLM +hallucinated. + +Secondly, if hallucination is abnormal behaviour, that implies that not +hallucinating is normal. + +That in turn means **there must be some mechanism**, some causal explanation +contained in how LLMs work, which explains why, under normal conditions, LLMs +don't hallucinate. + +These points are pretty subtle, so let's explore a couple of examples. + +Think about dice. I could roll six sixes in a row, and that would be really +unusual behaviour for the die. + ++ +
+ +But firstly, this wouldn't demand an explanation. It's not a bug, it's just +statistically improbable. + +And secondly, there's no mechanism which can explain why, under normal +conditions, a die shouldn't roll six sixes in a row. Very, very occasionally, +it's just what dice do. + +That's because it's normal (albeit uncommon) behaviour for a die to roll six +sixes in a row. + +In contrast, if I secure a picture to the wall with a picture hook, and it +falls off, I can reasonably expect there to be an explanation. + ++ +
+ +Maybe the nail was bent. Maybe I didn't hammer it in straight. Maybe I didn't +quite catch the string on the hook when I put it up. Point is, something must +have gone wrong for the picture hook to behave in this way. + +And again, we should expect there to be a mechanism which can explain why under +normal conditions, pictures hooked to the wall don't fall off. + +And indeed there is such a mechanism: the hook catches the string and transfers +the weight of the picture to the wall, instead of leaving that weight to pull +the picture down towards the ground. + +That's because for the picture to fall off the wall is abnormal behaviour +for a picture hook. + +So, to recap, if LLM hallucination is abnormal behaviour, we should expect two +things: + +1. There should be some abnormal conditions which explain why LLMs hallucinate + in some cases but not others. +2. There should be some mechanism which explains why, under normal conditions, + LLMs don't hallucinate. + +## Why the word 'hallucination' distorts how LLMs work + +**The word 'hallucination' doesn't satisfy either of those two expectations** +listed above. Therefore, the word 'hallucination' fundamentally distorts how +LLMs really work. + +Firstly, when an LLM hallucinates, we should expect there to be some abnormal +conditions which explain why it did so. But often, **there is no such +explanation**. + +Occasionally, LLMs produce unfaithful and untrue content for no good reason. +Often, the problem goes away by making meaningless tweaks in the wording of the +prompt, or even simply by running the algorithm again. + +Secondly, if hallucination is abnormal behaviour, we should expect there to be +some mechanism which causes LLMs not to hallucinate under normal conditions. +But **there is no such mechanism**. + +LLMs are predictive text machines, not oracles. They are not trained to target +faithfulness. They are not trained to target truth. They are trained to guess +the most plausible next word in the given sequence. So the case where the LLM +'hallucinates' and the case where it doesn't are causally indistinguishable. + ++ +
+ +There were some important and difficult points there, so let's elaborate. + +By comparison, for humans to produce false or unfaithful data would indeed be +abnormal behaviour for humans. + +But there are explanations for why humans sometimes produce such outputs: maybe +they had an incentive to lie, or they were playing a game, or they were joking. +Or indeed, perhaps they 'hallucinating' in the traditional sense! + +In contrast, an LLM can never have an incentive to lie, can never play a game, +and can never tell a joke. It's a pile of matrices. + +And there are mechanisms which explain why humans normally don't produce +unfaithful or unfactual content. We have mental models of the external world. We +continually amend those models in response to evidence. We receive that evidence +through perception. And we use language, among other things, as a means of +communicating our mental models with other people. + +But an LLM doesn't have any of these mechanisms. It doesn't have a model of +the external world. It doesn't respond to evidence. It doesn't gather evidence +through perception (or, indeed, by any other means). And it doesn't attempt to +use language as a means of communicating information with other minds. + +So the concept 'hallucination' fundamentally mischaracterises the nature of LLMs +by assuming that false or unfaithful outputs are abnormal for LLMs. In fact, +such outputs, while rare (at least in recent models), are nonetheless normal, +like a dice rolling six sixes in a row. + +## Why this matters + +OK, so 'hallucination' is not a great word. It assumes that unfaithful or +unfactual output is abnormal output for an LLM, when in fact, this just isn't +true. + +But why does this matter? Can't we afford a little fudge? + +I think the use of this word has serious consequences. + +In academia, researchers are spilling considerable amounts of ink trying to +ascertain the causes of 'hallucination'[^5]. + +Not only is this enterprise wholly in vain, but it risks spreading the +misperception that hallucination is incidental, and will disappear once we find +better ways of designing LLMs. That in turn could lead decision-makers to invest +in LLM applications that just won't work, under the false impression that the +'hallucination' problem will disappear in future. + +This isn't a hypothetical concern. LLMs have been set to +[writing entire travel articles](https://arstechnica.com/information-technology/2023/08/microsoft-ai-suggests-food-bank-as-a-cannot-miss-tourist-spot-in-canada), +[making legal submissions](https://www.texaslawblog.law/2023/06/sanctions-handed-down-to-lawyers-who-cited-fake-cases-relying-on-chatgpt), +[providing customers with information on company policies](https://bc.ctvnews.ca/air-canada-s-chatbot-gave-a-b-c-man-the-wrong-information-now-the-airline-has-to-pay-for-the-mistake-1.6769454) +with catastrophic results. + +And think about some of these common or proposed use cases: + +- Helping write submissions to academic journals +- Generating meeting minutes +- 'Chatting with data' +- Summarising non-structured data in social research +- Writing computer code +- Question-answering +- All-purpose 'AI assistants' + +Given that LLMs will, rarely, but inevitably, output garbage, all of these will +need to be done with enormous care to avoid causing serious damage. + +Furthermore, there's a risk that LLMs may act as a so-called 'moral crumple +zone', effectively soaking up responsibility for failures on behalf of the +humans who were really responsible. And where there's a lack of accountability, +organisations are unable to make good decisions or learn from their +mistakes[^2]. + ++ +
+ + +## How to understand LLM behaviour besides 'hallucinating' + +So, the word 'hallucination' isn't fit for purpose. But we do legitimately need +ways of understanding how LLMs behave, including their tendencies to produce +false or unfaithful outputs. + +### Confabulation: a step forward, but still flawed + +One alternative suggestion is to +[refer to the behaviour as 'confabulation' instead of 'hallucination'](https://journals.plos.org/digitalhealth/article?id=10.1371/journal.pdig.0000388). + +While, as the authors linked above rightly point out, this does address some +of the problems with the word 'hallucination', it still has the fundamental +flaw we've been discussing. It still assumes that 'hallucination', or +'confabulation', is a causally distinguishable abnormal behaviour. + +Humans sometimes confabulate, but normally they don't. When humans do +confabulate, we can expect there to be an explanation, like a head injury or +mental illness. And we can expect there to be mechanisms in how the human mind +works which explains why humans don't normally confabulate. + +But as we have seen, this is not an accurate analogy for LLMs. For an LLM, it is +just as normal for it to produce outputs which are false or unfaithful outputs +as it is for it to produce outputs which are true or faithful. + +### Bulls\*\*t: a better solution + +However, three philosophers at the University of Glasgow have come up with an +ingenious solution. + +[**LLMs are bulls\*\*t machines**](https://link.springer.com/article/10.1007/s10676-024-09775-5). + +Please don't be alarmed. Despite appearances, 'bulls\*\*t' +is a technical term in analytic philosophy, developed in +[Harry Frankfurt's 2005 classic, _On Bulls\*\*t_](https://doi.org/10.2307/j.ctt7t4wr). +Following Frankfurt's original usage, it's generally taken to mean something +like this: + +> **bulls\*\*tting**: speaking with a reckless disregard for the truth. + +Canonical examples include: + +- Pretending you know something about whisky to try and impress a client, + colleague, or crush +- Bizarre advertising slogans, like 'Red Bull gives you wings' +- Any time Paul Merton opens his mouth on [Just A Minute](https://www.bbc.co.uk/programmes/b006s5dp) + +Why is this a good match for describing how LLMs behave? Well, LLMs are trained +to imitate language patterns from its training data. The LLM has succeeded at +its task when its output is linguistically plausible. Not when it's true, +well-evidenced, or faithful to the input - just when it's linguistically +plausible. In other words, its goal is to 'sound like' what a human would say. + +This is just like what a human does when they bulls\*\*t. When someone +bulls\*\*ts, they're not aiming to produce anything true, faithful or coherent, +they're just trying to sound like they know what they're talking about. If I +can convince my crush that I'm a sophisticated gentleman that knows his Islays +from his Arrans, she just might consider going on a date with me. Actually +saying something true or meaningful about whisky is not the task at hand. + ++ +
+ +The word 'bulls\*\*t' is a huge step forward: + +- **It captures LLMs' tendency to produce false or unfaithful output.** Since + truth and faithfulness are not goals, it's perfectly possible for it to + produce false or unfaithful output purely by chance. +- **It explains why LLMs often produce true and faithful output**, even if it is + accidental. After all, the best way to sound like you know what you're talking + about is to actually say true things! +- **It correctly identifies false and unfaithful outputs as normal for an LLM** + - unlike 'hallucination'. + +Previously, we thought that an LLM reliably outputs true and faithful content, +except when it occasionally hallucinates. Now, we recognise that an LLM in fact +always outputs bulls\*\*t. + +### The benefits of bulls\*\*t + +So far, the word 'hallucination' has trapped us into talking pretty negatively +about LLMs. But the word 'bulls\*\*t', perhaps surprisingly, frees us to talk +honestly about the positive applications of LLMs as well. + +As several authors have pointed out before[^4], the tendency of LLMs to produce +false and misleading output corresponds closely to its tendency to produce +output which might be described, for want of a better word, as 'creative'. + +Trying to understand why 'hallucinating' goes together with 'creativity' is a +profound mystery. But it's obvious that 'bulls\*\*tting' is in some sense a +'creative' activity by nature. (Just ask Paul Merton!) + +Therefore, because we have a more accurate description of how LLMs work, we +can make more robust decisions both about where LLM behaviour is desirable, and +about where it isn't. + +Would you employ a professional bulls\*\*tter to write your company minutes? Or +write large chunks of computer code? Or summarise a legal contract? + +Or what about redrafting an email in a formal tone? Or suggesting a more +descriptive variable name in your code? Or suggesting a couple extra Acceptance +Criteria on your Scrum Issue that you might have forgotten? Are these +applications where 'sounds about right' is good enough? + +## Recap + +The word 'hallucination' has gripped the popular narrative about LLMs. But the +word was never suited to describing how LLMs behave, and using it puts us at +risk of making bad bets on how LLMs ought to be applied. + +By using the word 'bulls\*\*t', we recognise that while false and unfaithful +outputs may be rare, they should occasionally be expected as part of the normal +operation of the LLM. + +As a result, we are empowered to recognise the applications where LLMs can +really make a positive difference, as well as quickly recognise the applications +where it can't. + +## Footnotes + +[^1]: If you want a fuller treatment of how the word 'hallucination' has been used historically in the field, see [my personal blog post](https://joeac.net/blog/2024/07/16/word_hallucination_with_reference_to_llms) on the topic. +[^2]: For more on this idea, look into [Madeleine Clare Elish's concept of moral crumple zones](https://doi.org/10.17351/ests2019.260). +[^3]: For example, see [a Harvard study on hallucination in the legal context](https://doi.org/10.48550/arXiv.2405.20362), this [philosophical article](https://arxiv.org/pdf/2212.03551) and this [Forbes article](https://www.forbes.com/sites/lanceeliot/2022/08/24/ai-ethics-lucidly-questioning-this-whole-hallucinating-ai-popularized-trend-that-has-got-to-stop) expressing concern about the use of the word 'hallucination', this [BBC journalist on using LLMs in journalism](https://www.bbc.co.uk/rd/blog/2024-06-mitigating-llm-hallucinations-in-text-summarisation), and this [PBS article which quotes an academic saying 'this isn't fixable'](https://www.pbs.org/newshour/science/chatbots-can-make-things-up-can-we-fix-ais-hallucination-problem). +[^4]: See for example [Jiang, Xuhui et al 2024. A Survey on Large Language Model Hallucination via a Creativity Perspective](https://arxiv.org/abs/2402.06647). +[^5]: See for example [Huang, Lei et al 2023. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions](https://arxiv.org/abs/2311.05232). +[^6]: One thing 'epistemically supports' another thing just in case, if you already knew the first thing, you'd have good reason to believe the second thing, too. For example, that my flatmate, Sophie, is cross, provides epistemic support for the notion that her job interview didn't go well earlier today. diff --git a/jcarstairs/assets/ai-safety-summit.jpg b/jcarstairs/assets/ai-safety-summit.jpg new file mode 100644 index 0000000000..3c28947369 Binary files /dev/null and b/jcarstairs/assets/ai-safety-summit.jpg differ diff --git a/jcarstairs/assets/crumple-zone.jpg b/jcarstairs/assets/crumple-zone.jpg new file mode 100644 index 0000000000..8d4743a09c Binary files /dev/null and b/jcarstairs/assets/crumple-zone.jpg differ diff --git a/jcarstairs/assets/dice.jpg b/jcarstairs/assets/dice.jpg new file mode 100644 index 0000000000..68ac9d696a Binary files /dev/null and b/jcarstairs/assets/dice.jpg differ diff --git a/jcarstairs/assets/mirage.jpg b/jcarstairs/assets/mirage.jpg new file mode 100644 index 0000000000..807a0d18e0 Binary files /dev/null and b/jcarstairs/assets/mirage.jpg differ diff --git a/jcarstairs/assets/picture-falling-off-wall.jpg b/jcarstairs/assets/picture-falling-off-wall.jpg new file mode 100644 index 0000000000..9f619a6eca Binary files /dev/null and b/jcarstairs/assets/picture-falling-off-wall.jpg differ diff --git a/jcarstairs/assets/predictive-text.jpg b/jcarstairs/assets/predictive-text.jpg new file mode 100644 index 0000000000..d4f96f9426 Binary files /dev/null and b/jcarstairs/assets/predictive-text.jpg differ diff --git a/jcarstairs/assets/whisky.jpg b/jcarstairs/assets/whisky.jpg new file mode 100644 index 0000000000..46c2a1f546 Binary files /dev/null and b/jcarstairs/assets/whisky.jpg differ diff --git a/jcarstairs/atom.xml b/jcarstairs/atom.xml new file mode 100644 index 0000000000..775e00ab53 --- /dev/null +++ b/jcarstairs/atom.xml @@ -0,0 +1,4 @@ +--- +author: jcarstairs +layout: atom_feed +--- diff --git a/jcarstairs/feed.xml b/jcarstairs/feed.xml new file mode 100644 index 0000000000..f68b624392 --- /dev/null +++ b/jcarstairs/feed.xml @@ -0,0 +1,4 @@ +--- +author: jcarstairs +layout: rss_feed +--- diff --git a/jcarstairs/index.html b/jcarstairs/index.html new file mode 100644 index 0000000000..ac48c465ac --- /dev/null +++ b/jcarstairs/index.html @@ -0,0 +1,5 @@ +--- +title: Joe Carstairs +author: jcarstairs +layout: default_author +--- diff --git a/jcarstairs/picture.webp b/jcarstairs/picture.webp new file mode 100644 index 0000000000..0c2a15b7ff Binary files /dev/null and b/jcarstairs/picture.webp differ