Skip to content

Latest commit

 

History

History
 
 

model_llama3

Llama3 in AgentScope

AgentScope supports Llama3 now! You can

  • 🚀 Set up Llama3 model service in AgentScope! Both CPU and GPU inference are supported!
  • 🔧 Test Llama3 in AgentScope built-in examples!
  • 🖋 Use Llama3 to build your own multi-agent applications!

Follow the guidance below to use Llama3 in AgentScope!

Contents

CPU Inference

Setup Llama3 Service

AgentScope supports Llama3 CPU inference with the help of ollama. Note the llama3 models in ollama are quantized into 4 bits.

  1. Download ollama from here.

  2. Start ollama software, or execute the following command in terminal

    ollama serve
  3. Pull llama3 model by the following command

# llama3 8b model
ollama pull llama3

# llama3 70b model
ollama pull llama3:70b

Use Llama3 in AgentScope

Use llama3 model with the following model configuration in AgentScope

llama3_8b_ollama_model_configuration = {
   "config_name": "ollama_llama3_8b",
   "model_type": "ollama_chat",
   "model_name": "llama3",
   "options": {
       "temperature": 0.5,
       "seed": 123
   },
   "keep_alive": "5m"
}

llama3_70b_ollama_model_configuration = {
   "config_name": "ollama_llama3_70b",
   "model_type": "ollama_chat",
   "model_name": "llama3:70b",
   "options": {
       "temperature": 0.5,
       "seed": 123
   },
   "keep_alive": "5m"
}

After that, you can experience llama3 with our built-in examples! For example, start a conversation with llama3-8b model by the following code:

import agentscope
from agentscope.agents import UserAgent, DialogAgent

agentscope.init(model_configs=llama3_8b_ollama_model_configuration)

user = UserAgent("user")
agent = DialogAgent("assistant", sys_prompt="You're a helpful assistant.", model_config_name="ollama_llama3_8b")

x = None
while True:
    x = agent(x)
    x = user(x)
    if x.content == "exit":
        break

GPU Inference

Setup Llama3 Service

If you have a GPU, you can set up llama3 model service with the help of Flask and Transformers quickly.

Note you need to apply for permission to download the llama3 model from Hugging Face model hub.

  1. Install Flask and Transformers
pip install flask transformers torch
  1. Apply for model permission, and log in your huggingface account in terminal
huggingface-cli login
  1. Then run flask server by the following command in scripts directory:
# 8B model
python flask_transformers/setup_hf_service.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --port 8000

# 70B model
python flask_transformers/setup_hf_service.py --model_name_or_path meta-llama/Meta-Llama-3-70B-Instruct --port 8000

Use Llama3 in AgentScope

In AgentScope, use the following model configurations

llama3_flask_model_configuration = {
  "model_type": "post_api_chat",
  "config_name": "llama-3",
  "api_url": "http://127.0.0.1:8000/llm/",
  "json_args": {
    "max_length": 4096,
    "temperature": 0.5,
    "eos_token_id": [128001, 128009] # currently the model configuration in huggingface misses eos_token_id
  }
}