-
-
Notifications
You must be signed in to change notification settings - Fork 389
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run with Local LLM Models #25
Comments
Does this project support third-party OpenAI interfaces (such as poe.com)? If it does, are there any other requirements for these interfaces, such as message format, context memory, and number of conversations? |
@wingeva1986 Previously this repository was based on the API provided by xtkkey/gpt4free . The problem was that (rightly so) some API went down every day. And our repository was flooded with issues not related to the project but to the cracked xtekky API. At the moment the solution based on Free and Legal calls to chat.openai.com is the most stable solution. You could try to apply reverse engineering to sites or portals in a legal way. For example HuggingChat is a free service and open to all. It would be interesting to find the huggingChat endpoint and integrate it into the project. |
https://huggingface.co/CRD716/ggml-vicuna-1.1-quantized/blob/main/ggml-vicuna-7b-1.1-q4_0.bin |
We can't require llama models to be as competitive as GPT, keep in mind that the response depends on the number of parameters of the trained model... I've tried many models in my language, and they all generate stupid responses, like the GPT4ALL model based on parrot, alpaca. I have tested the Vicuna 13b Quantized model and let me tell you that despite having a weight of 4 GB, it is capable of maintaining a fluent conversation and consuming less resources... I am running it on a 4-core ARM Ampere server, with 32GB of RAM and it uses more CPU than RAM and is able to respond correctly. I also managed to implement it to a WhatsApp chat using the Bayleis library. I made this answer using the translator, my native language is Spanish. |
Have you tested mosaicml/mpt-7b-chat, or mosaicml/mpt-7b-instruct? Seems promising |
@Therealkorris We haven't tried it yet but we believe that mpt-7b-instruct and Lamini-gpt can give better results than other opensource models. Have you already managed to implement a pipeline to generate text with mpt-7b-instruct ? If yes, what hardware do you have? Do you want to share your Pipeline? |
@HirCoir Have you already implemented a pipeline to generate the text? What hardware does it run on? |
What do you think about : Cerebras |
Any other LLM model support? Trying to use new mega13b |
@GoZippy @wingeva1986 @Therealkorris @HirCoir We all know more or less open source models. The problem is that a new one comes out every day. Most lack the performance of GPT3 . if you want to help us, share here the code to implement an inference with the models you recommend, so that we can test them easily. for example , @GoZippy , share us your code that you use to do the inference on the mega13b model. So we create a custom llm wrapper with langchain and run Autogpt , if it gives good results we upload everything to the repository ❤ thanks for the help |
currently, Starling is the best 7B model to date: |
Any progress on this? I'll be home shortly and will look into this again but have been using other tools as of late... I lost track of where autogpt was going with all the forge stuff... A year ago... |
I'm the same, I have been too busy so I stopped keeping up. but recently, I found an AI agent called evo.ninja it has workspace and great interface and currently it's ranked as the top autoGPT agent. unfortunately, it requires OpenAI API. so I looked into alternatives and this is how I came here |
We tried many local models like LLAMA, VICUNA, OPENASSIST, GPT4ALL in their 7b versions. None seem to give results like the CHATGPT API.
we would like to try to test new models, which can be loaded in a maximum of 16gm of RAM, to allow accessibility to anyone without discrimination.
Any advice for LLM models with fine tuning for high performance instructions?
The text was updated successfully, but these errors were encountered: