Research Update - 2024-09-28 - Trying "debate" epistemics #51
Replies: 3 comments 1 reply
-
To be fair, o1 still fails at this task too. Perhaps Claude's initial success was a fluke, as I cannot reproduce the results. |
Beta Was this translation helpful? Give feedback.
-
Trying in the workbench too |
Beta Was this translation helpful? Give feedback.
-
Adding "verify step by step" to the instructions seems to help
and the answer:
This only took two turns in the workbench https://console.anthropic.com/workbench/9eb25f31-1828-478e-8cfc-507daa279234 Unable to reproduce in the web interface though, potentially due to temperature too high. |
Beta Was this translation helpful? Give feedback.
-
Context
I was watching MLST (machine learning street talk) specifically this episode: https://youtu.be/WlWAhjPfROU?si=pc-MzEBFDBjk_7lj
They talk about using debate to increase truthfulness and reduce hallucination. It occurred to me, based on my conversations with various folks building and running multi-agent frameworks, that this could be a good way to generate training data. For instance, in my conversation with Dr. Chi Wang (lead of Microsoft AutoGen) what he said is that agents need to specialize (https://youtu.be/aJGdt9q7sS0?si=LQrNcERncq4UZW2p) and that this provides much better output.
Experiment
I tried some experiments with this, with the idea being to use different models as well as different agents (e.g. Anthropic vs OpenAI) as they will have different strengths and weaknesses. Documented here: https://youtu.be/8oF3jykyDHw?si=nGy6PhyykRASxU56
I also tried Anthropic/Anthropic
Agent Roles
Ultimately, I created two distinct agent roles as defined by the following SYSTEM prompts:
Agent 1: The Critical Thinker
Agent 2: The Critical Partner
Results
Discussion
There is something qualitatively different about the feedback I gave Claude in the original discussion where I demonstrated that just giving Claude instructions to "think through with Chain of Thought" worked very well, and then I gave it very simple simple feedback like "Close but not quite." This very light critical feedback, combined with the original instructions I did in the first experiment, were enough to enable Claude to succeed.
Future Work
Data
Here are some examples of the failures where Claude agreed with itself despite failing the task
Followed by
Observation
Some of the problems could be due to tokenization e.g. LLMs often have a hard time counting letters in a sentence. However, despite this, in the original conversation I had with Claude, it still succeeded, making me think that tokenization is not necessarily the problem. (see original experiment documented here: #10 )
Beta Was this translation helpful? Give feedback.
All reactions