v0.2.5: Chat Turns, LLM Scorers #110
ianarawjo
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
We're excited to release two new nodes: Chat Turns and LLM Scorers. These nodes came from feedback during user sessions:
We describe these new nodes below, as well as a few quality-of-life improvements.
🗣️ Chat Turn nodes
Chat models are all the rage (in fact, they are so important that OpenAI announced it would no longer support plain-old text generation models going forward.) Yet strikingly, very few prompt engineering tools let you evaluate LLM outputs beyond a prompt.
Now with Chat Turn nodes, you can continue conversations beyond a single prompt. In fact, you can:
Continue multiple conversations simultaneously across multiple LLMs
Just connect the Chat Turn to your initial Prompt Node, and voilà:
Here, I've first prompted four chat models: GPT3.5, GPT-4, Claude-2, and PaLM with the question: "What was the first {game} game?". Then I ask a follow-up question, "What was the second?" By default, Chat Turns continue the conversation with all LLMs that were used before, allowing you to follow-up on LLM responses in parallel. (You can also toggle that off, if you want to query different models --more details below).
Template chat messages, just like prompts
You can do everything you can with Chat Turns that you could with Prompt Nodes, including prompt templating and adding input variables. For instance, here's a prompt template as a follow-up message:
Start a conversation with one LLM, and continue it with a different LLM
Chat Turns include a toggle of whether you'd like to continue chatting with the same LLMs, or query different ones, passing chat context to the new models. With this, you can start a conversation with one LLM and continue it with another (or several):
Supported chat models
Simple in concept, chat turns were the result of 2 weeks' work, revising many parts of the ChainForge backend to store and carry chat context. Chat history is automatically translated to the appropriate format for a number of providers:
microsoft/DialoGPT
. Go to the HuggingFace site to find more!)Let us know what you think!
🤖 LLM Scorer nodes
More commonly called "LLM evaluators", LLM scorer nodes allow you to use an LLM to 'grade'/score outputs of other LLMs:
Although ChainForge supported this functionality before via prompt chaining, it was not straightforward and required an additional chain to a code evaluator node for postprocessing. You can now connect the output of the scorer directly to a Vis Node to plot outputs. For instance, here's GPT-4 scoring whether different LLM responses apologized for a mistake:
Note that LLM scores are finicky --if one score isn't in the right format (true/false), visualization nodes won't work properly, because they'll think the outputs are notof boolean type but categorical. We'll work on improving this, but, for now, enjoy LLM scorers!
❗ Why we're not calling LLM scorers 'LLM evaluators'
We thought long and hard about what to call LLMs that score outputs of other LLMs. Ultimately, using LLMs to score outputs is helpful, and can save time when it's hard to write code to achieve the same effect. However, LLMs are imperfect. Although the AI community currently uses the term 'LLM evaluator,' we ultimately decided not to use that term, for a few reasons:
Fundamentally, then, we disagree with the positions taken by projects like LangChain, which tend to emphasize LLM scorers as the go-to solution for evaluation. We believe this is a massive mistake that tends to mislead people and causing them to over-trust AI outputs, including ML researchers at MIT. In choosing the term Scorers, we aim to --at the very least --distance ourselves from such positions.
Other changes
Future Work
Chat Turns opened up a whole new can of worms, both for the UI, and for evaluation. Some open questions are:
We hope to prioritize such features based on user feedback. If you use Chat Turns or LLM Scorers, let us know what you think --open an Issue or start a Discussion! 👍
This discussion was created from the release v0.2.5: Chat Turns, LLM Scorers.
Beta Was this translation helpful? Give feedback.
All reactions