Allow Naomi to use an LLM #436

aaronchantrill · 2024-12-16T01:33:15Z

Description

This change basically gives Naomi the ability to use an LLM in crafting responses. The LLM is contacted over a web api, so any OpenAI compatible LLM server should be fine. I am currently using llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB VRAM. I am currently using the following model:
https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf This is only a 3B parameter model, so it should run fine on cards with less VRAM. I have also tried running it on an Intel graphics card on my laptop using SyCL which seems to work fine.

This is very much experimental. The basic idea is that I use the existing TTI parser to activate a plugin, then pass the output from that plugin to the LLM as a system message. This allows the LLM to use current data and answer questions like "What time is it?" or "Are we expecting rain on Thursday?" I'm also looking into using function calling which would allow the LLM to be used as an intent parser and might make plugin activation more integrated, but that has its own set of problems, including that different models define and use plugins differently.

I also have included some notebooks. I plan to use these to create a set of notebooks that will hopefully be helpful to people who want to better understand how Naomi works internally.

Related Issue

Integrate Naomi with LLM #435

Motivation and Context

With the rise of LLM chatbots, I wanted to give Naomi more capabilities for carrying on a conversation. Whether this is a good idea or not, I'm not sure. The book "It is better to be a good computer than a bad person" would argue that Naomi performs its function perfectly well and adding another layer of NLP is just complicating things. At the same time, having played with LLMs for a while now, they are fun to play with and definitely a step towards the Doctor Who K-9 or Star Wars C3P0 type entities. I'm interested to see what people think if they play with this. What works, what doesn't?

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have added tests to cover my changes.
All new and existing tests passed.

This change basically gives Naomi the ability to use an LLM in crafting responses. The LLM is contacted over a web api, so any OpenAI compatible LLM server should be fine. I am currently using llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB VRAM. I am currently using the following model: https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf This is only a 3B parameter model, so it should run fine on cards with less VRAM. I have also tried running it on an Intel graphics card on my laptop using SyCL which seems to work fine. This is very much experimental. The basic idea is that I use the existing TTI parser to activate a plugin, then pass the output from that plugin to the LLM as a system message. This allows the LLM to use current data and answer questions like "What time is it?" or "Are we expecting rain on Thursday?" I'm also looking into using function calling which would allow the LLM to be used as an intent parser and might make plugin activation more integrated, but that has its own set of problems, including that different models define and use plugins differently. I also have included some notebooks. I plan to use these to create a set of notebooks that will hopefully be helpful to people who want to better understand how Naomi works internally.

The news and hacker news plugins were using a test mic, which did not have the new use_llm property.

naomi/llama_client.py

+        self.completion_url = completion_url
+        self.prompt_headers = {'Authorization': api_key or "Bearer your_api_key_here"}
+        self._messages = personality_preprompt
+        self.template = Template(TEMPLATES[template])


naomi/llama_client.py

+        line = json.loads(line)
+        return line
+
+    def _process_line(self, line):


naomi/mic.py

@@ -396,6 +419,8 @@
            else:
                self.say_i_do_not_understand()
                handled = True
+        if not self.Continue:
+            quit()


aaronchantrill added 2 commits December 15, 2024 20:08

Fix a unit test

a41d7a4

The news and hacker news plugins were using a test mic, which did not have the new use_llm property.

aaronchantrill marked this pull request as draft December 16, 2024 01:33

aaronchantrill added Status: In Progress Type: Enhancement Priority: Low labels Dec 16, 2024

aaronchantrill linked an issue Dec 16, 2024 that may be closed by this pull request

Integrate Naomi with LLM #435

Open

github-advanced-security bot found potential problems Dec 16, 2024

View reviewed changes

Fixing CodeQL issue

bc08b65

aaronchantrill self-assigned this Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow Naomi to use an LLM #436

Allow Naomi to use an LLM #436

aaronchantrill commented Dec 16, 2024

Allow Naomi to use an LLM #436

Are you sure you want to change the base?

Allow Naomi to use an LLM #436

Conversation

aaronchantrill commented Dec 16, 2024

Description

Related Issue

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist: