The purpose of this repository is to let people to use lots of open sourced instruction-following fine-tuned LLM models as a Chatbot service. Because different models behave differently, and different models require differently formmated prompts, I made a very simple library Ping Pong
for model agnostic conversation and context managements. Also, I made GradioChat
UI looking similar to HuggingChat but entirely built in Gradio. Those two projects are fully integrated to power this project.
Different model might have different strategies to manage context, so if you want to know the exact strategies applied to each model, take a look at the chats
directory. However, here are the basic ideas that I have come up with initially. I have found long prompts will slow down the generation process a lot eventually, so I thought the prompts should be kept as short as possible while as concise as possible at the same time. In the previous version, I have accumulated all the past conversations, and that didn't go well.
- In every turn of the conversation, the past
N
conversations will be kept. Think about theN
as a hyper-parameter. As an experiment, currently the past 2-3 conversations are only kept for all models. - (TBD) In every turn of the conversation, it summarizes or extract information. The summarized information will be given in the every next turn of conversation.
Checkout the list of models
- tloen/alpaca-lora-7b: the original 7B Alpaca-LoRA checkpoint by tloen (updated by 4/4/2022)
- LLMs/Alpaca-LoRA-7B-elina: the 7B Alpaca-LoRA checkpoint by Chansung (updated by 5/1/2022)
- LLMs/Alpaca-LoRA-13B-elina: the 13B Alpaca-LoRA checkpoint by Chansung (updated by 5/1/2022)
- LLMs/Alpaca-LoRA-30B-elina: the 30B Alpaca-LoRA checkpoint by Chansung (updated by 5/1/2022)
- LLMs/Alpaca-LoRA-65B-elina: the 65B Alpaca-LoRA checkpoint by Chansung (updated by 5/1/2022)
- LLMs/AlpacaGPT4-LoRA-7B-elina: the 7B Alpaca-LoRA checkpoint trained on GPT4 generated Alpaca style dataset by Chansung (updated by 5/1/2022)
- LLMs/AlpacaGPT4-LoRA-13B-elina: the 13B Alpaca-LoRA checkpoint trained on GPT4 generated Alpaca style dataset by Chansung (updated by 5/1/2022)
- stabilityai/stablelm-tuned-alpha-7b: StableLM based fine-tuned model
- beomi/KoAlpaca-Polyglot-12.8B: Polyglot based Alpaca style instruction fine-tuned model
- declare-lab/flan-alpaca-xl: Flan XL(3B) based Alpaca style instruction fine-tuned model.
- declare-lab/flan-alpaca-xxl: Flan XXL(11B) based Alpaca style instruction fine-tuned model.
- OpenAssistant/stablelm-7b-sft-v7-epoch-3: StableLM(7B) based OpenAssistant's oasst1 instruction fine-tuned model.
- Writer/camel-5b-hf: Palmyra-base based instruction fine-tuned model. The foundation model and the data are from its creator, Writer.
- lmsys/fastchat-t5-3b-v1.0: T5(3B) based Vicuna style instruction fine-tuned model on SharedGPT by lm-sys
- LLMs/Stable-Vicuna-13B: Stable Vicuna(13B) from Carpel AI and Stability AI. This is not a delta weight, so use it at your own risk. I will make this repo as private soon and add Hugging Face token field.
- LLMs/Vicuna-7b-v1.1: Vicuna(7B) from FastChat. This is not a delta weight, so use it at your own risk. I will make this repo as private soon and add Hugging Face token field.
- LLMs/Vicuna-13b-v1.1: Vicuna(13B) from FastChat. This is not a delta weight, so use it at your own risk. I will make this repo as private soon and add Hugging Face token field.
- togethercomputer/RedPajama-INCITE-Chat-7B-v0.1: RedPajama INCITE Chat(7B) from Together.
- mosaicml/mpt-7b-chat: MPT-7B from MOSAIC ML.
- teknium/llama-deus-7b-v3-lora: LLaMA 7B based Alpaca style instruction fine-tuned model. The only difference between Alpaca is that this model is fine-tuned on more data including Alpaca dataset, GPTeacher, General Instruct, Code Instruct, Roleplay Instruct, Roleplay V2 Instruct, GPT4-LLM Uncensored, Unnatural Instructions, WizardLM Uncensored, CamelAI's 20k Biology, 20k Physics, 20k Chemistry, 50k Math GPT4 Datasets, and CodeAlpaca
- HuggingFaceH4/starchat-alpha: Starcoder 15.5B based instruction fine-tuned model. This model is particularly good at answering questions about coding.
- LLMs/Vicuna-LoRA-EvolInstruct-7B: LLaMA 7B based Vicuna style instruction fine-tuned model. The dataset to fine-tune this model is from WizardLM's Evol Instruction dataset.
- LLMs/Vicuna-LoRA-EvolInstruct-13B: LLaMA 13B based Vicuna style instruction fine-tuned model. The dataset to fine-tune this model is from WizardLM's Evol Instruction dataset.
- project-baize/baize-v2-7b: LLaMA 7B based Baize
- project-baize/baize-v2-13b: LLaMA 13B based Baize
- timdettmers/guanaco-7b: LLaMA 7B based Guanaco which is fine-tuned on OASST1 dataset with QLoRA techniques introduced in "QLoRA: Efficient Finetuning of Quantized LLMs" paper.
- timdettmers/guanaco-13b: LLaMA 13B based Guanaco which is fine-tuned on OASST1 dataset with QLoRA techniques introduced in "QLoRA: Efficient Finetuning of Quantized LLMs" paper.
- timdettmers/guanaco-33b-merged: LLaMA 30B based Guanaco which is fine-tuned on OASST1 dataset with QLoRA techniques introduced in "QLoRA: Efficient Finetuning of Quantized LLMs" paper.
- tiiuae/falcon-7b-instruct: Falcon 7B based instruction fine-tuned model on Baize, GPT4All, GPTeacher, and RefinedWeb-English datasets.
- tiiuae/falcon-40b-instruct: Falcon 40B based instruction fine-tuned model on Baize and RefinedWeb-English datasets.
- Prerequisites
Note that the code only works Python >= 3.9
and gradio >= 3.32.0
$ conda create -n llm-serve python=3.9
$ conda activate llm-serve
- Install dependencies.
flash-attn
andtriton
are included to supportMPT
models, If you don't want to useMPT
, comment them out, otherwise you will face twomodule not found errors
, then you will have to installpackaging
andtorch
packages while facing the errors.
$ cd LLM-As-Chatbot
$ pip install -r requirements.txt
- Run Gradio application
$ python app.py
You need to follow the following steps to bring your own models in this project.
- Add your model spec in
model_cards.json
. If you don't have thumnail image, just leave it as blank string(""
). - Add the button for your model in
app.py
. Don't forget to give it a name in thegr.Button
andgr.Markdown
. For placeholders, their names are omitted. Assign thegr.Button
to a variable with the name of your choice. - Add the button variable to the button list in the
app.py
- Determine the model type in
global_vars.py
. If you think your model is similar to one of the existings, just add a filtering rules(if-else
) and give it the same name. - (Optional) if your model is totally new one, you need to give a new
model_type
inglobal_vars.py
, and make changes accordingly inutils.py
, andchats/central.py
.
- Gradio components to control the configurations of the generation
-
Flan based Alpaca
models - Multiple conversation management
- Implement server only option w/ FastAPI
- ChatGPT's plugin like features
- I am thankful to Jarvislabs.ai who generously provided free GPU resources to experiment with Alpaca-LoRA deployment and share it to communities to try out.
- I am thankful to Common Computer who generously provided A100(40G) x 8 DGX workstation for fine-tuning the models.