MemGPT + open/local LLMs #67

cpacker · 2023-10-17T20:00:45Z

cpacker
Oct 17, 2023
Maintainer

⭐ We've added support for running MemGPT with open/local LLMs!

Instructions on how to connect MemGPT to open/local LLMs can be found on our docs page.

🙋 Need help with local LLMs? Check Discord!

If you need help visit our Discord server and post in the #support channel.

You can also use this GitHub discussions page, but the Discord server is the official support channel and is monitored more actively.

To help us (and the entire MemGPT community) help you, please provide the following information when asking a new question about debugging a local model:

The exact model you're trying to use (link to the HuggingFace page you downloaded it from)
- We need the EXACT model name, including quantization scheme, e.g.: dolphin-2.1-mistral-7b.Q6_K.gguf (not just dolphin-2.1-mistral)
The local LLM backend you are using (web UI? LM Studio?)
Your hardware for the local LLM backend (local computer? operating system? remote RunPod?)
Your hardware for the MemGPT command (same computer as the local LLM backend?)
The full output (or as much as possible) of where the LLM is failing
- If you can include screenshots, even better!

⚠️ This is an experimental feature so expect to find bugs

Managing memory in MemGPT requires a lot of instruction following (the LLM needs to follow instructions in the system prompt on how to use the memory functions). GPT-4 can do this well, but even the best open LLMs may struggle to do this correctly, so you will likely observe MemGPT + open LLMs not working very well. This problem gets worse as the LLM gets worse, eg if you're trying a small quantized llama2 model, expect MemGPT to perform very poorly.

If your model outputs bad function calls / bad JSON, things will fail. Even if the model outputs good JSON, if you don't parse it correctly, it will also fail.

cameronbergh · 2023-10-18T22:28:13Z

cameronbergh
Oct 18, 2023

i have tried to run memGPT using the Chat Completion API in LM Studio using a few different models which have been finetuned on function calling datasets. the model rizerphe/CodeLlama-function-calling-6320-7b-Instruct-GGUF, seems like it might work but LM studio throws an error regarding the function calling syntax.

ive heard somewhere that falcon-180b can correctly do function calls. i believe this is because falcon-180b-chat is finetuned on the airoboros dataset which features examples of function calling... so perhaps airoboros models will work as well.

3 replies

biali Oct 28, 2023

Any success getting memgpt working with lm studio?

Rivelyn Nov 21, 2023

I gave up on that a few weeks ago, unless something has changed I read that LM Studio did not support grammar, which apparently is a real issue when working with open source models. Instead I stopped using LM Studio and start using text-gen webui and that really helped clear up a lot of issue I was having with MemGPT and local LLMs.

alwaysimproving26 Nov 22, 2023

From a comment I saw on a YouTube video for AutoGEN + MemGPT + Local LLM >> "I will just note that doing it with studio LM - works - just change the port to 5001 in LM Studio (instead of 1234)"; Link to Code: https://github.com/PromptEngineer48/MemGPT-AutoGEN-LLM/blob/main/app.py; Source: https://youtu.be/bMWXXPoDnDs?si=JbzuVBbSQGYzez78 (Comments Section)

mberman84 · 2023-10-18T23:58:33Z

mberman84
Oct 18, 2023

I'd love to see this as well

0 replies

console-1 · 2023-10-19T00:20:48Z

console-1
Oct 19, 2023

In a utopian world someone explains to me how to resolve this issue of function calling within LM Studio by selecting a LLM that actually runs on my machine. Then turn it into an absolute savage set of agents with AutoGen, which is also on the roadmap. Patience is a virtue it seems.

0 replies

cameronbergh · 2023-10-20T17:17:06Z

cameronbergh
Oct 20, 2023

attempting to run memgpt with the oobabooga openai endpoint gives a

step() failed with openai.InvalidRequestError, but didn't recognize the error message: 'functions is not supported.'

it would seem that the api endpoint emulator thingie needs to be updated to support openai style function calling

https://github.com/oobabooga/text-generation-webui/blob/main/extensions/openai/script.py

or perhaps memgpt could be changed to do its function calling in a different format?

0 replies

cameronbergh · 2023-10-20T20:19:12Z

cameronbergh
Oct 20, 2023

it seems like it should be possible to implement memgpt by replacing the function calling api with open-interpreter and the codelllama 34b.

0 replies

molander · 2023-10-21T04:02:51Z

molander
Oct 21, 2023

What about using LocalAI? If memgpt supports OPENAI_API_HOST/BASE env vars, it should "just work" for the most part, as LocalAI provides a drop-in replacement for OpenAI chat/completion/functions and provides the Go-bindings for Llama.cpp: https://github.com/go-skynet/LocalAI, so can run on just about anything.

4 replies

MrXandbadas Oct 21, 2023

Biggest issue is Prompt Prefixing and Suffixing by the Agent Setups. LLama2 uses [INST] [/ISNT] and [SYS] [/SYS] tags in the prompt.

shawnholt Nov 21, 2023

Just came across this if helpful - meta-llama/llama#162

bioshazard Jan 18, 2024

LocalAI (maybe new since Nov '23) supports Chat templates such that you should not need to do anything special in your prompt on the client side to get an identical chat experience without prefix/suffix-ing.

bioshazard Jan 18, 2024

And it seems I should just be able to configure MemGPT against OpenAI with overridden base path during config. But it fails at the first message due to an unclear issue maybe related to grammar constraint? But I can confirm grammar constraint is functional in LocalAI with the example curl they provide.

zdendos · 2023-10-21T06:34:22Z

zdendos
Oct 21, 2023

I found a model that has been optimized for function calling (https://huggingface.co/Trelis/Llama-2-7b-chat-hf-function-calling-v2).
AutoGen contains an "oai" module that can be used in MemGPT (code modification in "memgpt/openai_tools.py").
To run the model ( fLlama-2-7b-chat.ggmlv3.q3_K_M.bin ), use the Llama2 server.
The only thing missing is support for function calling in LLama2 server, but work is already underway on that ( function_call support for LLMs/llama_cpp_python abetlen/llama-cpp-python#397 )

1 reply

MrXandbadas Oct 21, 2023

Function Calling

If you use this inside of AutoGen then you can ask the GroupManager to Filter Fuction calls out from LLM output and ensure agents with that function are then used xD (Discussion here #65 (comment))
Im about to post a comment and i'll actually mention this here

I'll look into this model you suggest. The issue above and the Prompt Prefixing and Suffixing are the biggest issues personally when trying to get Local inference of LLMs (out of the box with OpenAI style Calls) with these kinds of Agents

Zukanto · 2023-10-21T13:13:23Z

Zukanto
Oct 21, 2023

So it is planed to run MemGPT with own fine tuned GPT 3.5 Models?

2 replies

cpacker Oct 23, 2023
Maintainer Author

You can if you wish - with the latest patch I think you can just point MemGPT at your custom 3.5 inference endpoint with OPENAI_API_BASE .

Zukanto Oct 23, 2023

Thank you! I will try that

nbollman · 2023-10-21T21:23:31Z

nbollman
Oct 21, 2023

https://huggingface.co/THUDM/agentlm-7b (13b,70b) This project looks promising for local llm instruction tuned LLMs, I wonder if it could mesh into your testing? How goes fine tuning a Mistral 7b, Im very excited at the prospect of a local model being able to perform tasks that are dependent on AutoGPT API Access... Local data handling by internal AI llms is very appealing to some. :)

4 replies

d0rc Oct 22, 2023

I am uncertain if this is pertinent to MemGPT, but I successfully enabled function calls on Mistral7b right away. Furthermore, I discovered that https://huggingface.co/TheBloke/dolphin-2.1-mistral-7B-GGUF (5_K_S) operates nearly flawlessly. I made significant progress with it, especially when I maintained simple initial instructions. I employed a few strategies to ensure its functionality – such as multiple samplings, incorporating a JSON opener to the agent's response prior to initiating generation (originally designed for llama-1), and halting the JSON parser once a parsable string is obtained. This approach has proven effective.

I encountered MemGPT during my efforts to condense context for the agent, and to ascertain if MemGPT is genuinely simple enough for Mistral7b, as I understand it. The prompts appear quite intricate to me; nevertheless, I desired to experiment with it. However, it's possible that you have already done so and could share any challenges you encountered?

angar4ik Oct 22, 2023

@d0rc how have you enabled function calls on Mistral? Thanks

d0rc Oct 22, 2023

Prompt (it contains markdown, so pasting as is):

You are default mode network of an AI agent. Do whatever you feel to be right thing to do while agent is idle.

Available tools:
browse-site - use it to browses specific URL, name: "browse-site", args: "url": "url", "question": "question to look answer for";
bing-search - use it to search Bing, name: "bing-search", args: "keywords": "search keywords or question";
hire-agent - use it hire a new agent for specific task, name: "hire-agent", args: "role-name": "name of the position", "task-description": "a short well crafted description of the goal";
read-note - use it to read a note, name: "read-note", args: "section": "section name";
write-note - use it to take a note, name: "write-note", args: "section": "section name", "text": "text".
list-notes - use it get names of last 50 notes you've made.

Constraints:

don't ask for help or clarification;
don't dream facts, verify your sources;
don't provide explanations beyond JSON;
use only commands provided.

If you need to recall or remeber something, just think about it.
Always make sure you're not repeating yourself.

Respond in parsable JSON, use format:

{
    "thought":"thought text, which provides critics of possible solutions",
     "criticism":"constructive self-criticism, question your assumptions",
    "action":"action description, which should clearly state the task",
    "command":{
        "name":"name of the command from the list above",
        "args":{},
    }
}

d0rc Oct 22, 2023

and it's quite consistent responding with JSON, still I have some tricks implemented, developed for older models, I gave the list of tricks in my message before in this thread.

cpacker · 2023-10-23T08:26:12Z

cpacker
Oct 23, 2023
Maintainer Author

⭐ We've added support for running MemGPT with local LLMs!

You can find instructions on how to set this up on this README. The tl;dr is:

Unlikely: If you have a ChatCompletion endpoint that supports function calling (both the LLM web server and LLM support function calling), you can just set OPENAI_API_BASE to point at your endpoint!
More likely: If you have a local LLM that was fine-tuned on function calling that you want to add to MemGPT, you'll need to write some parsing code to (1) inject the functions into the prompt, and (2) extract the function call from the response.

We include an example for running MemGPT with Airoboros - we include example function call parsers so you can just set up Airoboros behind WebUI and it should "work" (we did some very limited testing, and it can make basic calls to send_message and edit_core_memory 😉).

Things we're still working on (welcoming contributions!):

Experimenting with more models and adding parsers/wrappers for the models that work best (even the example for Airoboros 2.1 70B doesn't work that well, maybe ~3.5 level)
Adding support for more backends (currently, there's code for WebUI, but it should be easy to add more!)

This is just a start and we're excited to work with the community to make MemGPT with open models a viable alternative to GPT-4 🚀

9 replies

cameronbergh Oct 23, 2023

airoboros 70b 3.1.2 was posted yesterday, i havent tried it but the model card on huggingface has some example of function calling, also math json and some other interesting features https://huggingface.co/jondurbin/airoboros-l2-70b-3.1.2 (quantized https://huggingface.co/TheBloke/Airoboros-L2-70B-3.1.2-GGUF)

cameronbergh Oct 23, 2023

I like the idea of using different models for specific tasks, AutoGen has this configuration thing where you can set a specific endpoint URL for each agent and i think if we could do that with memGPT then we could use something like MistralLite (32k context https://huggingface.co/amazon/MistralLite) for memory management and data retrieval while using the 70b model for the more complex tasks.

cpacker Oct 23, 2023
Maintainer Author

@cameronbergh downloaded 70b 3.1.2 overnight, will do some testing on it ASAP (and push a relevant wrapper)!

Is there any particular way you think the configuration UX would work best? I'm not really familiar with the AutoGen config setup, can take a look at it later.

Currently, the workflow is (assuming you're running MemGPT local):

you specify the model with --model, which will route to a "wrapper" (https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/chat_completion_proxy.py#L25-L32)
if no wrapper is specified, it'll default to some basic one (like the Airoboros one currently)
if you want to add specific support for a new model you're testing, you can add your own wrapper class that just needs two methods to be written (https://github.com/cpacker/MemGPT/tree/main/memgpt/local_llm#example-with-airoboros-llama2-finetune)

This was the simplest way to do it I could come up with (that doesn't just hide the complexity of working with open models), but I'm open to refactoring if the community has suggestions.

cameronbergh Oct 23, 2023

what i had in mind was the way autogen allows agents to be configured using json, but i suppose at this point i am just talking about integration with autogen which is a separate issue.

here is an example of how that might look, its not a super UX friendly thing but anyway.

config_list_airoboros = [
{
'model': 'airoboros312',
'api_key': 'null',
'base_url': 'http://localhost:1234
},

airoboros312_config = {
"seed": 42, # change the seed for different trials
"temperature": 0,
"config_list": config_list_airoboros,
"request_timeout": 120,
}

config_list_mistrallite = [
{
'model': 'mistrallite',
'api_key': 'null',
'base_url': 'http://localhost:5678'
},

mistrallite_config = {
"seed": 42, # change the seed for different trials
"temperature": 0,
"config_list": config_list_mistrallite,
"request_timeout": 120,
}

memGPT = autogen.AssistantAgent(
name="memGPT",
llm_config=airoboros_config,
system_message=''hey you are memGPT do stuff'
''',
)

data_retriever = autogen.AssistantAgent(
name="memory",
llm_config=mistral_config,
system_message=''hey you are memGPT's file reading assistant please summarize this file with your 32k memory'
'''
)

cameronbergh Nov 9, 2023

Tested the new ChatGLM3-6b with oobabooga in fp16 with the mistral-dolphin wrapper. When running main.py with default settings it asked me my name before doing anything else, wrote a note about it in persona then assumed the role of chad! ChatGLM3 is trained to be agentic and is very good at function calls but i think it needs a more direct prompt. totally chad mode.

Bec-k · 2023-10-23T08:31:50Z

Bec-k
Oct 23, 2023

Maybe integrate AutoGPT with MemGPT? Or making a fork of it and refactor to it to work only as a function caller/parser and extension functionality. They have a pretty decent features out there, along with api integrations and hooks/callbacks (i.e. module integration)

18 replies

d0rc Oct 23, 2023

@cpacker, have you had any thoughts on how to benchmark models concerning their ability to make function calls? Alternatively, have you come across any benchmarks or have ideas on how to generate such a benchmark using GPT-4?

Bec-k Oct 23, 2023

@cpacker yes, it is working with already parsed json, here is json schema of expected LLM response:
https://github.com/Significant-Gravitas/AutoGPT/blob/ab362f96c3255052350e8e8081b363c7b97ffd6f/autogpts/autogpt/autogpt/agents/prompt_strategies/one_shot.py#L58

Typically LLM is returning json, which AutoGPT is expecting. Same stuff in your project. What is different from Auto-GPT?

Bec-k Oct 23, 2023

@cpacker here is prompt with schema https://github.com/Significant-Gravitas/AutoGPT/blob/ab362f96c3255052350e8e8081b363c7b97ffd6f/autogpts/autogpt/autogpt/agents/prompt_strategies/one_shot.py#L311
It seems they are using typescript types in new version, instead of json schema.

cpacker Oct 23, 2023
Maintainer Author

@Bec-k MemGPT is a specific way to do memory management in an LLM memory hierarchy (working context, which has multiple pieces + external context, which can be implemented in various ways) - please see our academic website and paper for more details on what MemGPT is. To implement MemGPT, we use the function calling paradigm (we describe a function set/schema in the system message, and expect the LLM to be able to generate calls to these models), specifically in this repo we use the ChatCompletion function calling setup. This discussion thread is about what existing open LLM models will work well with the MemGPT function set, and how to integrate them with MemGPT. AutoGPT also uses function calling (like many other OpenAI API-based repos), they have their own function set for their own purposes (the "commands"), and the code you're linking is about basic functionality with non-OpenAI LLMs that can output JSON given the same AutoGPT system message. This JSON-string to JSON itself is not that hard to handle, and it's what we're doing to parse outputs from Airoboros in our example that uses simple JSON parsing. These are implementation details related to function calling, this does not mean that MemGPT == AutoGPT.

Happy to discuss the differences of MemGPT and AutoGPT, but please open a separate issue for additional related comments (I don't want to derail this thread further). Thanks!

cpacker Oct 23, 2023
Maintainer Author

@d0rc I haven't thought about this much (also I haven't seen any), but if you see any function calling benchmarks that you think look reasonable definitely let us know! For our purposes we'll probably be doing a lot of empirical testing to see which models fail with MemGPT's function set and which approach ~gpt-4 level (eg calling memory edits at the right time with the right arguments).

Bec-k · 2023-10-23T08:37:10Z

Bec-k
Oct 23, 2023

Also they are using agent-protocol to standartize communication with non OpenAI models, which is basically mimicking same api to preserve compatibility among others open models.

0 replies

MrXandbadas · 2023-10-23T09:17:14Z

MrXandbadas
Oct 23, 2023

AMAZING

0 replies

sdrakulich · 2023-10-23T12:37:36Z

sdrakulich
Oct 23, 2023

Please consider prioritizing the wrapper for LiteLLM https://github.com/BerriAI/litellm

4 replies

ilyakam Oct 23, 2023

It sounds like that won't happen. See this response from one of the maintainers.

cpacker Oct 23, 2023
Maintainer Author

Integrating LiteLLM adds a serious dependency and does not solve the open LLM problem - you wouldn't be able to just "run MemGPT with 100s of LLMs" because of lack of function call support (both at the API and LLM level). Even the examples in the proposed linked PR would not work.

You can read our notes on function calling here to get a better understanding of why adding open LLM support is trickier than swapping the endpoint: https://github.com/cpacker/MemGPT/tree/localllm/memgpt/local_llm#status-of-chatcompletion-w-function-calling-and-open-llms

If you happen to have a ChatCompletion API endpoint that does already handle function calling properly, you can make MemGPT point there by just changing OPENAI_API_BASE.

krrishdholakia Oct 24, 2023

Hey @cpacker I'm the maintainer of litellm, and appreciate your feedback here.

Re: function calling. For non-openai models, you can choose to add it to the prompt instead via litellm.add_function_to_prompt = True.

Docs: https://docs.litellm.ai/docs/completion/input

This isn't a perfect fix, and communities generally tend to write their own prompts for this - e.g. openinterpreter.

I want to improve this on our end. How can we be most helpful @cpacker ?

cpacker Oct 24, 2023
Maintainer Author

Left a comment here: #86 (comment)

(Please direct all further litellm comments to that thread)

cameronbergh · 2023-10-23T20:07:35Z

cameronbergh
Oct 23, 2023

i am interested in fine tuning a model specifically for memGPT though i am not exactly sure of the best way to do this. it seems that we would need to gather raw communication logs from a variety of memGPT use cases, probably using the gpt4-32k model?
and then use those logs to finetune a long context llama?

3 replies

cpacker Oct 23, 2023
Maintainer Author

This is something we're also actively working on right now, and yeah we're on the same page about the general approach! You need to acquire "message traces" to fine-tune on. We are currently looking into generating this data with separate models (e.g. running MemGPT with an automated user, or converting open chat datadats to MemGPT format with augmented function calls, or generating the data "from scratch" with special prompts), as opposed to manually collecting it. Still not sure about the best way to do this though, so if you have any ideas happy to discuss and help out!

d0rc Oct 24, 2023

how are you going to distinguish between good and bad chains of continuations? by hands?

cpacker Oct 26, 2023
Maintainer Author

Automated via the wrapper/parser, but yes the wrapper uses a lot of hand-written heuristics, eg https://github.com/cpacker/MemGPT/blob/main/memgpt/local_llm/llm_chat_completion_wrappers/airoboros.py#L168-L205

CUexter · 2023-11-06T11:35:10Z

CUexter
Nov 6, 2023

Currently testing Zephyr 7B beta, The problem i am getting is sometimes the model fails to realize that the inner thought part are supposed not present in the conversation: ? for example like this:

As you can see here, after i asked whether i could give them another name, the agent should have give me a yes/no answer, which it did in the inner thought session, but failed to say it out loud, but instead they just continue their inner thought.

1 reply

CUexter Nov 6, 2023

Another issue i encounter is the problem of function hallucination, where llm tries to call non-existing function

MikeyBeez · 2023-11-07T19:34:19Z

MikeyBeez
Nov 7, 2023

❯ python ../tests/test_cli.py
Requirement already satisfied: pexpect in /Users/bard/anaconda3/envs/lmstudio/lib/python3.11/site-packages (4.8.0)
Requirement already satisfied: ptyprocess>=0.5 in /Users/bard/anaconda3/envs/lmstudio/lib/python3.11/site-packages (from pexpect) (0.7.0)
Traceback (most recent call last):
File "/Users/bard/Code/lmstudio/MemGPT/memgpt/../tests/test_cli.py", line 8, in
from .constants import TIMEOUT
ImportError: attempted relative import with no known parent package

1 reply

cpacker Nov 7, 2023
Maintainer Author

Hey @MikeyBeez, is there a reason you're trying to run the tests?

For running MemGPT with local LLMs, follow the instructions here: https://memgpt.readthedocs.io/en/latest/local_llm/

pligor · 2023-11-07T21:54:36Z

pligor
Nov 7, 2023

Since MemGPT has been severely upgraded, could you upgrade also this guide to show what are the correct steps for the env vars, for memgpt configure and for memgpt run ? Many thanks

3 replies

cpacker Nov 7, 2023
Maintainer Author

New documentation is located here: https://memgpt.readthedocs.io/en/latest/local_llm/

Also updated the OP to point to this link.

Please let us know if there's anything missing!

pligor Nov 8, 2023

Hi @cpacker . Many thanks for the updated document.
One can start experimenting when has at least something running.
Could you develop in that page a longer step-by-step guide for dummies ?
There are lots of open questions of what to select in steps of memgpt configure and memgpt run to have a safe first run and then one can take from there to experiment with other options and other models.

Anrock Nov 8, 2023

Second this. As a person who has almost zero knowledge about how LLMs work some answers for memgpt configure questions weren't obvious and I did some bruteforcing to make it work.

I'm trying to use memgpt with ollama and recommended dolphin-mistral model and, for example, it wasn't obvious that answer to "embedding point" was "local" and not ollama localhost endpoint. Same goes for OpenAI / Azure / whatever was the third - to me OpenAI is strongly associated with ChatGPT so I answered No to that - I want to run ollama not ChatGPT!

I know that lack of knowledge is on me and this product is very experimental and user friendliness is probably not a priority but a simple walkthrough on memgpt configure question would've helped a lot.

MikeyBeez · 2023-11-07T22:34:52Z

MikeyBeez
Nov 7, 2023

I'm giving up. Maybe I can look at this again in 6 months?

1 reply

cpacker Nov 7, 2023
Maintainer Author

Hey @MikeyBeez sorry you ran into problems! Happy to try and help you get this up-and-running. We just updated the instructions here: https://memgpt.readthedocs.io/en/latest/local_llm, which also has a recommended model to use with MemGPT (dolphin 2.2.1)

Zenopheus · 2023-11-08T01:12:59Z

Zenopheus
Nov 8, 2023

At this point, what is the most feature rich and reliable back-end to use? LM Studio is not an option for me since it's closed source.

1 reply

cpacker Nov 8, 2023
Maintainer Author

LM Studio is using llama.cpp under the hood, which is very "bare bones" (eg it will segfault on a bad request) but has all the important features (eg grammar, which LM Studio is missing even though it's built on top of it). If you're looking for something more user friendly, I'd recommend web UI. If you're on Windows I hear koboldcpp is a friendly wrapper on top llama.cpp too. More info on backends can be found here: https://memgpt.readthedocs.io/en/latest/local_llm/

Anrock · 2023-11-08T12:33:53Z

Anrock
Nov 8, 2023

Having unexpected troubles with CUDA: out of memory which looks a bit off since it looks like enough VRAM is available.

Backend: Ollama 1.8
Ollama model: dolphin2.2-mistral:7b-q6_K 7934ace667a7
Both Ollama and memgpt are running on my PC with AMD Ryzen 9 3900X 12-Core, 32Gb of RAM and NVIDIA GeForce RTX 3070 (8 Gb VRAM)
OS: Gentoo Linux

What happened:

I've installed memgpt[local] into venv (was recommended by pip since Gentoo has lots of system python-packages and apparently I can brick something without venv) per https://memgpt.readthedocs.io/en/latest/local_llm/
I was already using ollama for some time, so I already had configured Ollama service running in background
Exported ENV variable per https://memgpt.readthedocs.io/en/latest/ollama/
Run memgpt configure, I don't remember the exact options I've selected but I believe it was all defaults except inference endpoint set to ollama localhost address. No idea how to check my current config without overwriting it.
memgpt run
Had a small chat with Sam, told him my name and he remembered (woah what have science done)
Launched a /heartbeat out of curiosity and Sam asked me how he should call me (a bit weird)
I asked him if he was ready to take a look at my work notes, he went thinking
for a bit and then I got non-200 error.

Checking ollama server log I've out that ollama apparently run out of memory
which is a bit weird to me: I have 8Gb and according to nvidia-smi output only
~1Gb is used by other applications on my system. From what I understood from
ollama log the model only needs ~5Gb and it all should fit into VRAM together
with other apps.

More than that: it looks like MemGpt (or at least this current instance) is
completely borked - running memgpt again works alright until I press enter when
prompted to generate first message it thinks for a bit and than fails with
non-200 code again. This is reproducible every time I run memgpt now.

? Would you like to select an existing agent? Yes
? Select agent: agent_1
Using existing agent agent_1
Hit enter to begin (will request first MemGPT message)

{'content': '{"status": "OK", "message": null, "time": "2023-11-08 02:39:00 AM '
            '"}',
 'name': 'core_memory_replace',
 'role': 'function'}
Warning: no wrapper specified for local LLM, using the default wrapper (you can remove this warning by specifying the wrapper with
--model)
step() failed
user_message = None
error = API call got non-200 response code (code=500, msg=) for address: http://localhost:11434/api/generate. Make sure that the ollama
API server is running and reachable at http://localhost:11434/api/generate.

... rest of trace is ommited for brevity

Ollama server log, same-ish every time I run memgpt

Nov 08 15:00:51 anrock-home ollama[289273]: 2023/11/08 15:00:51 llama.go:259: 6079 MB VRAM available, loading up to 25 GPU layers
Nov 08 15:00:51 anrock-home ollama[289273]: 2023/11/08 15:00:51 llama.go:384: starting llama runner
Nov 08 15:00:51 anrock-home ollama[289273]: 2023/11/08 15:00:51 llama.go:442: waiting for llama runner to start responding
Nov 08 15:00:51 anrock-home ollama[289273]: ggml_init_cublas: found 1 CUDA devices:
Nov 08 15:00:51 anrock-home ollama[289273]:   Device 0: NVIDIA GeForce RTX 3070, compute capability 8.6
Nov 08 15:00:52 anrock-home ollama[295460]: {"timestamp":1699444852,"level":"INFO","function":"main","line":1323,"message":"build info","build":219,"commit":"9e70cc0"}
Nov 08 15:00:52 anrock-home ollama[295460]: {"timestamp":1699444852,"level":"INFO","function":"main","line":1325,"message":"system info","n_threads":12,"n_threads_batch":-1,"total_threads":24,"system_info":"AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | "}
Nov 08 15:00:52 anrock-home ollama[289273]: llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /usr/share/ollama/.ollama/models/blobs/sha256:5baee5b5452ffedd86564ba69b1772db3d588532248e3750649eade398a6db8f (version GGUF V2 (latest))

lots of llama loader stuff omitted for brevity

Nov 08 15:00:52 anrock-home ollama[289273]: llm_load_tensors: using CUDA for GPU acceleration
Nov 08 15:00:52 anrock-home ollama[289273]: llm_load_tensors: mem required  = 1399.80 MB
Nov 08 15:00:52 anrock-home ollama[289273]: llm_load_tensors: offloading 25 repeating layers to GPU
Nov 08 15:00:52 anrock-home ollama[289273]: llm_load_tensors: offloaded 25/35 layers to GPU
Nov 08 15:00:52 anrock-home ollama[289273]: llm_load_tensors: VRAM used: 4266.41 MB
Nov 08 15:00:52 anrock-home ollama[289273]: ...................................................................................................
Nov 08 15:00:52 anrock-home ollama[289273]: llama_new_context_with_model: n_ctx      = 8000
Nov 08 15:00:52 anrock-home ollama[289273]: llama_new_context_with_model: freq_base  = 10000.0
Nov 08 15:00:52 anrock-home ollama[289273]: llama_new_context_with_model: freq_scale = 1
Nov 08 15:00:53 anrock-home ollama[289273]: llama_new_context_with_model: kv self size  = 1000.00 MB
Nov 08 15:00:53 anrock-home ollama[289273]: llama_new_context_with_model: compute buffer total size = 545.75 MB
Nov 08 15:00:53 anrock-home ollama[289273]: llama_new_context_with_model: VRAM scratch buffer: 539.63 MB
Nov 08 15:00:53 anrock-home ollama[289273]: llama_new_context_with_model: total VRAM used: 4806.04 MB (model: 4266.41 MB, context: 539.63 MB)
Nov 08 15:00:53 anrock-home ollama[295460]: llama server listening at http://127.0.0.1:64360
Nov 08 15:00:53 anrock-home ollama[295460]: {"timestamp":1699444853,"level":"INFO","function":"main","line":1746,"message":"HTTP server listening","hostname":"127.0.0.1","port":64360}
Nov 08 15:00:53 anrock-home ollama[295460]: {"timestamp":1699444853,"level":"INFO","function":"log_server_request","line":1233,"message":"request","remote_addr":"127.0.0.1","remote_port":51472,"status":200,"method":"HEAD","path":"/","params":{}}
Nov 08 15:00:53 anrock-home ollama[289273]: 2023/11/08 15:00:53 llama.go:456: llama runner started in 1.800947 seconds
Nov 08 15:01:02 anrock-home ollama[289273]: CUDA error 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5626: out of memory
Nov 08 15:01:02 anrock-home ollama[289273]: current device: 0
Nov 08 15:01:02 anrock-home ollama[289273]: 2023/11/08 15:01:02 llama.go:399: 2 at /go/src/github.com/jmorganca/ollama/llm/llama.cpp/gguf/ggml-cuda.cu:5626: out of memory
Nov 08 15:01:02 anrock-home ollama[289273]: current device: 0
Nov 08 15:01:02 anrock-home ollama[289273]: 2023/11/08 15:01:02 llama.go:473: llama runner stopped successfully
Nov 08 15:01:02 anrock-home ollama[289273]: 2023/11/08 15:01:02 [Recovery] 2023/11/08 - 15:01:02 panic recovered:
Nov 08 15:01:02 anrock-home ollama[289273]: runtime error: invalid memory address or nil pointer dereference
Nov 08 15:01:02 anrock-home ollama[289273]: /usr/local/go/src/runtime/panic.go:261 (0x451137)
Nov 08 15:01:02 anrock-home ollama[289273]: /usr/local/go/src/runtime/signal_unix.go:861 (0x451105)
Nov 08 15:01:02 anrock-home ollama[289273]: /go/src/github.com/jmorganca/ollama/server/routes.go:234 (0x98f535)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x995ac4)
Nov 08 15:01:02 anrock-home ollama[289273]: /go/src/github.com/jmorganca/ollama/server/routes.go:659 (0x995ab2)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x972db9)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/recovery.go:102 (0x972da7)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x971f5d)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/logger.go:240 (0x971f2c)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/context.go:174 (0x97101a)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:620 (0x970cad)
Nov 08 15:01:02 anrock-home ollama[289273]: /root/go/pkg/mod/github.com/gin-gonic/[email protected]/gin.go:576 (0x9707dc)
Nov 08 15:01:02 anrock-home ollama[289273]: /usr/local/go/src/net/http/server.go:2938 (0x6d326d)
Nov 08 15:01:02 anrock-home ollama[289273]: /usr/local/go/src/net/http/server.go:2009 (0x6cf153)
Nov 08 15:01:02 anrock-home ollama[289273]: /usr/local/go/src/runtime/asm_amd64.s:1650 (0x46d680)
Nov 08 15:01:02 anrock-home ollama[289273]: 
Nov 08 15:01:02 anrock-home ollama[289273]: [GIN] 2023/11/08 - 15:01:02 | 500 | 11.233322372s |       127.0.0.1 | POST     "/api/generate"

2 replies

Anrock Nov 11, 2023

Seems to be fixed on memgpt 2.1

Anrock Nov 11, 2023

Ah, nope. Got borked again.

cameronbergh · 2023-11-15T19:22:12Z

cameronbergh
Nov 15, 2023

the agentlm 70b 4_k_m gguf model works great with memgpt also this dataset might be useful for training the memgpt model https://github.com/THUDM/AgentTuning

0 replies

diskreet90 · 2023-11-17T23:05:54Z

diskreet90
Nov 17, 2023

This just got released yesterday, I'm going to see how it holds up. It's supposed to be specific for function calling which I could see helping a lot.
https://huggingface.co/gorilla-llm/gorilla-openfunctions-v1

6 replies

CUexter Nov 19, 2023

can gorilla do normal chat ? or just call function

streichsbaer Nov 20, 2023

can gorilla do normal chat ? or just call function

It's an extension of a LLM. From their docs "Gorilla OpenFunctions extends Large Language Model(LLM) Chat Completion feature to formulate executable APIs call given natural language instructions and API context."

Also, just stumbled over OpenHermes-2.5, which is based on mistral 7b. Looks promising.

drew9781 Nov 21, 2023

I haven't seen news of anyone trying to tie in the new gorilla-llm yet, so maybe I will take a stab at it.
I checked out the model and it doesn't seem to be meant for typical chat, so I'm thinking of chaining it with mistral 7b.

I'm thinking the best place to do this will be in the LLM's wrapper class. But I feel like there has to be a better way to do this.

drew9781 Nov 21, 2023

I have tried OpenHermes-2.5 for function calling in another project of mine, I was able to get the accuracy pretty high using LoRAs trained on the custom commands. I had planned on trying it out with MemGPT but never got around to it.
Now with Gorilla-llm releasing their open model, I think it's the simplest solution.
Fune-tuned OpenHermes may be a better performer in the sense that there is only one LLM query.

CUexter Nov 28, 2023

is there a function calling dataset for memgpt that can be used to finetune gorilla, i think @cpacker mentions they are trying something like that

Wwilcz2 · 2023-11-29T00:29:52Z

Wwilcz2
Nov 29, 2023

A quick question: Does anyone have a colab notebook on how to run this with a local llm without autogen? I'm using ollama and litellm, but any other setup that calls a local endpoint would be much appreciated. Thank you.

2 replies

cpacker Dec 1, 2023
Maintainer Author

https://memgpt.readthedocs.io/en/latest/autogen/ has a notebook that uses OpenAI, but it's really similar to the groupchat example tutorial so you should be able to copy-paste to make the OpenAI notebook into a local LLM notebook

Wwilcz2 Dec 2, 2023

The thing I want to run memgpt without autogen, but still be able to feed it custom prompts generated with scripts (like a custom RAG script).

All the instructions on this git are how to run memgpt as a process... Is there some way to call it from a python script? Like you call Ollama with a request to a local api endpoint?

d4munche3z · 2023-12-01T03:05:12Z

d4munche3z
Dec 1, 2023

Is there a way to get this to work on vllm? I attempted this but was unsuccessful.

1 reply

cpacker Dec 1, 2023
Maintainer Author

https://memgpt.readthedocs.io/en/latest/vllm/

^lmk if this doesn't work for you

cameronbergh · 2023-12-06T01:56:45Z

cameronbergh
Dec 6, 2023

NexusRaven-V2-13B (https://huggingface.co/Nexusflow/NexusRaven-V2-13B)

new model just dropped today. idk how i missed the first one but anyway version two works decently with current version of MemGPT with default settings.

what they did is very impressive and i think memGPT could benefit alot from this.

they have a slightly different prompting style and syntax and i think it would perform much better with some modifications to the memgpt code.

on a sorta related note; ive always thought that writing prompts with python style indentation would somehow help since most models are familiar with that.

0 replies

bitsnaps · 2023-12-31T17:48:04Z

bitsnaps
Dec 31, 2023

I think mixtral-7x8 can be supported easily since you're already supporting mistral-7b, I'm not sure if you guys using the new API of openai, because I'm still getting the server hit the old endpoint in this example, it could be an issue with the old text-generation-webui though...

0 replies

Bec-k · 2024-01-12T11:41:32Z

Bec-k
Jan 12, 2024

Yeah, i have also headed back here to check whether MemGPT is already working fine with mixtral-7x8 and others fine-tuned versions of it. I think it's ready for MemGPT function calling requirements.

0 replies

yamosin · 2024-01-14T04:39:50Z

yamosin
Jan 14, 2024

Can we get support for TabbyAPI in the future? This is a local backend made by the author of exl2 https://github.com/theroyallab/tabbyAPI

0 replies

gaborkukucska · 2024-01-23T16:47:47Z

gaborkukucska
Jan 23, 2024

If anyone is looking for the documentation then the correct link is https://memgpt.readme.io/docs/local_llm

0 replies

MemGPT + open/local LLMs #67

cpacker Oct 17, 2023 Maintainer

⭐ We've added support for running MemGPT with open/local LLMs!

🙋 Need help with local LLMs? Check Discord!

⚠️ This is an experimental feature so expect to find bugs

Replies: 58 comments · 156 replies

Function Calling

cpacker Oct 23, 2023 Maintainer Author

cpacker Oct 23, 2023 Maintainer Author

⭐ We've added support for running MemGPT with local LLMs!

Things we're still working on (welcoming contributions!):

cpacker Oct 23, 2023 Maintainer Author

cpacker Oct 23, 2023 Maintainer Author

cpacker Oct 23, 2023 Maintainer Author

cpacker Oct 23, 2023 Maintainer Author

cpacker Oct 24, 2023 Maintainer Author

cpacker Oct 23, 2023 Maintainer Author

cpacker Oct 26, 2023 Maintainer Author

cpacker Nov 7, 2023 Maintainer Author

cpacker Nov 7, 2023 Maintainer Author

cpacker
Oct 17, 2023
Maintainer

Replies: 58 comments 156 replies

cpacker Oct 23, 2023
Maintainer Author

cpacker
Oct 23, 2023
Maintainer Author

cpacker Oct 23, 2023
Maintainer Author

cpacker Oct 23, 2023
Maintainer Author

cpacker Oct 23, 2023
Maintainer Author

cpacker Oct 23, 2023
Maintainer Author

cpacker Oct 24, 2023
Maintainer Author

cpacker Oct 23, 2023
Maintainer Author

cpacker Oct 26, 2023
Maintainer Author

cpacker Nov 7, 2023
Maintainer Author

cpacker Nov 7, 2023
Maintainer Author