Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
flask_llama3_model_configurations.json		flask_llama3_model_configurations.json
ollama_llama3_model_configurations.json		ollama_llama3_model_configurations.json

README.md

Llama3 in AgentScope

AgentScope supports Llama3 now! You can

🚀 Set up Llama3 model service in AgentScope! Both CPU and GPU inference are supported!
🔧 Test Llama3 in AgentScope built-in examples!
🖋 Use Llama3 to build your own multi-agent applications!

Follow the guidance below to use Llama3 in AgentScope!

CPU Inference

Setup Llama3 Service

AgentScope supports Llama3 CPU inference with the help of ollama. Note the llama3 models in ollama are quantized into 4 bits.

Download ollama from here.
Start ollama software, or execute the following command in terminal
```
ollama serve
```
Pull llama3 model by the following command

# llama3 8b model
ollama pull llama3

# llama3 70b model
ollama pull llama3:70b

Use Llama3 in AgentScope

Use llama3 model with the following model configuration in AgentScope

llama3_8b_ollama_model_configuration = {
   "config_name": "ollama_llama3_8b",
   "model_type": "ollama_chat",
   "model_name": "llama3",
   "options": {
       "temperature": 0.5,
       "seed": 123
   },
   "keep_alive": "5m"
}

llama3_70b_ollama_model_configuration = {
   "config_name": "ollama_llama3_70b",
   "model_type": "ollama_chat",
   "model_name": "llama3:70b",
   "options": {
       "temperature": 0.5,
       "seed": 123
   },
   "keep_alive": "5m"
}

After that, you can experience llama3 with our built-in examples! For example, start a conversation with llama3-8b model by the following code:

import agentscope
from agentscope.agents import UserAgent, DialogAgent

agentscope.init(model_configs=llama3_8b_ollama_model_configuration)

user = UserAgent("user")
agent = DialogAgent("assistant", sys_prompt="You're a helpful assistant.", model_config_name="ollama_llama3_8b")

x = None
while True:
    x = agent(x)
    x = user(x)
    if x.content == "exit":
        break

GPU Inference

Setup Llama3 Service

If you have a GPU, you can set up llama3 model service with the help of Flask and Transformers quickly.

Note you need to apply for permission to download the llama3 model from Hugging Face model hub.

Install Flask and Transformers

pip install flask transformers torch

Apply for model permission, and log in your huggingface account in terminal

huggingface-cli login

Then run flask server by the following command in scripts directory:

# 8B model
python flask_transformers/setup_hf_service.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --port 8000

# 70B model
python flask_transformers/setup_hf_service.py --model_name_or_path meta-llama/Meta-Llama-3-70B-Instruct --port 8000

Use Llama3 in AgentScope

In AgentScope, use the following model configurations

llama3_flask_model_configuration = {
  "model_type": "post_api_chat",
  "config_name": "llama-3",
  "api_url": "http://127.0.0.1:8000/llm/",
  "json_args": {
    "max_length": 4096,
    "temperature": 0.5,
    "eos_token_id": [128001, 128009] # currently the model configuration in huggingface misses eos_token_id
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model_llama3

model_llama3

README.md

Llama3 in AgentScope

Contents

CPU Inference

Setup Llama3 Service

Use Llama3 in AgentScope

GPU Inference

Setup Llama3 Service

Use Llama3 in AgentScope

Files

model_llama3

Directory actions

More options

Directory actions

More options

Latest commit

History

model_llama3

Folders and files

parent directory

README.md

Llama3 in AgentScope

Contents

CPU Inference

Setup Llama3 Service

Use Llama3 in AgentScope

GPU Inference

Setup Llama3 Service

Use Llama3 in AgentScope