-
-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Naomi to use an LLM #436
base: naomi-dev
Are you sure you want to change the base?
Conversation
This change basically gives Naomi the ability to use an LLM in crafting responses. The LLM is contacted over a web api, so any OpenAI compatible LLM server should be fine. I am currently using llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB VRAM. I am currently using the following model: https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf This is only a 3B parameter model, so it should run fine on cards with less VRAM. I have also tried running it on an Intel graphics card on my laptop using SyCL which seems to work fine. This is very much experimental. The basic idea is that I use the existing TTI parser to activate a plugin, then pass the output from that plugin to the LLM as a system message. This allows the LLM to use current data and answer questions like "What time is it?" or "Are we expecting rain on Thursday?" I'm also looking into using function calling which would allow the LLM to be used as an intent parser and might make plugin activation more integrated, but that has its own set of problems, including that different models define and use plugins differently. I also have included some notebooks. I plan to use these to create a set of notebooks that will hopefully be helpful to people who want to better understand how Naomi works internally.
The news and hacker news plugins were using a test mic, which did not have the new use_llm property.
self.completion_url = completion_url | ||
self.prompt_headers = {'Authorization': api_key or "Bearer your_api_key_here"} | ||
self._messages = personality_preprompt | ||
self.template = Template(TEMPLATES[template]) |
Check warning
Code scanning / CodeQL
Jinja2 templating with autoescape=False Medium
line = json.loads(line) | ||
return line | ||
|
||
def _process_line(self, line): |
Check notice
Code scanning / CodeQL
Explicit returns mixed with implicit (fall through) returns Note
@@ -396,6 +419,8 @@ | |||
else: | |||
self.say_i_do_not_understand() | |||
handled = True | |||
if not self.Continue: | |||
quit() |
Check warning
Code scanning / CodeQL
Use of exit() or quit() Warning
Description
This change basically gives Naomi the ability to use an LLM in crafting responses. The LLM is contacted over a web api, so any OpenAI compatible LLM server should be fine. I am currently using llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB VRAM. I am currently using the following model:
https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf This is only a 3B parameter model, so it should run fine on cards with less VRAM. I have also tried running it on an Intel graphics card on my laptop using SyCL which seems to work fine.
This is very much experimental. The basic idea is that I use the existing TTI parser to activate a plugin, then pass the output from that plugin to the LLM as a system message. This allows the LLM to use current data and answer questions like "What time is it?" or "Are we expecting rain on Thursday?" I'm also looking into using function calling which would allow the LLM to be used as an intent parser and might make plugin activation more integrated, but that has its own set of problems, including that different models define and use plugins differently.
I also have included some notebooks. I plan to use these to create a set of notebooks that will hopefully be helpful to people who want to better understand how Naomi works internally.
Related Issue
Integrate Naomi with LLM #435
Motivation and Context
With the rise of LLM chatbots, I wanted to give Naomi more capabilities for carrying on a conversation. Whether this is a good idea or not, I'm not sure. The book "It is better to be a good computer than a bad person" would argue that Naomi performs its function perfectly well and adding another layer of NLP is just complicating things. At the same time, having played with LLMs for a while now, they are fun to play with and definitely a step towards the Doctor Who K-9 or Star Wars C3P0 type entities. I'm interested to see what people think if they play with this. What works, what doesn't?
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: