Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Naomi to use an LLM #436

Draft
wants to merge 3 commits into
base: naomi-dev
Choose a base branch
from

Conversation

aaronchantrill
Copy link
Contributor

Description

This change basically gives Naomi the ability to use an LLM in crafting responses. The LLM is contacted over a web api, so any OpenAI compatible LLM server should be fine. I am currently using llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB VRAM. I am currently using the following model:
https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf This is only a 3B parameter model, so it should run fine on cards with less VRAM. I have also tried running it on an Intel graphics card on my laptop using SyCL which seems to work fine.

This is very much experimental. The basic idea is that I use the existing TTI parser to activate a plugin, then pass the output from that plugin to the LLM as a system message. This allows the LLM to use current data and answer questions like "What time is it?" or "Are we expecting rain on Thursday?" I'm also looking into using function calling which would allow the LLM to be used as an intent parser and might make plugin activation more integrated, but that has its own set of problems, including that different models define and use plugins differently.

I also have included some notebooks. I plan to use these to create a set of notebooks that will hopefully be helpful to people who want to better understand how Naomi works internally.

Related Issue

Integrate Naomi with LLM #435

Motivation and Context

With the rise of LLM chatbots, I wanted to give Naomi more capabilities for carrying on a conversation. Whether this is a good idea or not, I'm not sure. The book "It is better to be a good computer than a bad person" would argue that Naomi performs its function perfectly well and adding another layer of NLP is just complicating things. At the same time, having played with LLMs for a while now, they are fun to play with and definitely a step towards the Doctor Who K-9 or Star Wars C3P0 type entities. I'm interested to see what people think if they play with this. What works, what doesn't?

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

This change basically gives Naomi the ability to use an LLM in
crafting responses. The LLM is contacted over a web api, so any
OpenAI compatible LLM server should be fine. I am currently using
llama.cpp on a system with an nVidia GeForce RTX 3060 with 12 GB
VRAM. I am currently using the following model:
https://huggingface.co/mav23/Llama_3.2_1B_Intruct_Tool_Calling_V2-GGUF/blob/main/llama_3.2_1b_intruct_tool_calling_v2.Q4_K_M.gguf
This is only a 3B parameter model, so it should run fine on cards
with less VRAM. I have also tried running it on an Intel graphics
card on my laptop using SyCL which seems to work fine.

This is very much experimental. The basic idea is that I use the
existing TTI parser to activate a plugin, then pass the output
from that plugin to the LLM as a system message. This allows the
LLM to use current data and answer questions like "What time is
it?" or "Are we expecting rain on Thursday?" I'm also looking into
using function calling which would allow the LLM to be used as
an intent parser and might make plugin activation more integrated,
but that has its own set of problems, including that different
models define and use plugins differently.

I also have included some notebooks. I plan to use these to create
a set of notebooks that will hopefully be helpful to people who
want to better understand how Naomi works internally.
The news and hacker news plugins were using a test mic, which did
not have the new use_llm property.
naomi/llama_client.py Fixed Show fixed Hide fixed
self.completion_url = completion_url
self.prompt_headers = {'Authorization': api_key or "Bearer your_api_key_here"}
self._messages = personality_preprompt
self.template = Template(TEMPLATES[template])

Check warning

Code scanning / CodeQL

Jinja2 templating with autoescape=False Medium

Using jinja2 templates with autoescape=False can potentially allow XSS attacks.
line = json.loads(line)
return line

def _process_line(self, line):

Check notice

Code scanning / CodeQL

Explicit returns mixed with implicit (fall through) returns Note

Mixing implicit and explicit returns may indicate an error as implicit returns always return None.
@@ -396,6 +419,8 @@
else:
self.say_i_do_not_understand()
handled = True
if not self.Continue:
quit()

Check warning

Code scanning / CodeQL

Use of exit() or quit() Warning

The 'quit' site.Quitter object may not exist if the 'site' module is not loaded or is modified.
@aaronchantrill aaronchantrill self-assigned this Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate Naomi with LLM
1 participant