Running GenAI on Intel AI Laptops and Simple LLM Inference on CPU and fine-tuning of LLM Models using Intel® OpenVINO™
-> Fine-tuned Llama2-7b model on Intel Products and Services FAQ custom Dataset.
-> Comverted to OpenVINO IR Format for optimized inferencing, being 56% faster than the original model.
📺 Demo (Youtube)
The dataset was prepared by scraping data regarding Intel Product and Services from the Intel FAQ and help websites. The capability of this model is limited to the dataset used, which includes the below Intel Products
- 🚀 Intel Gaudi
- 🔧 POP Intel
- ⚡ Intel Optane
- 🛠️ IPP Intel
- 🔗 Intel MPI Library
- 🧠 Intel OpenVINO
- 🛡️ Product Support FAQ
- 📦 Product Installation FAQ
- 🌐 General Intel Information
- Install packages required for using Optimum Intel integration with the OpenVINO backend:
pip install optimum[openvino]
- Import and initialize the model from HuggingFace:
from transformers import AutoTokenizer
from optimum.intel.openvino import OVModelForCausalLM
model_name = "OjasPatil/intel-llama2-7b-ov"
tokenizer = AutoTokenizer.from_pretrained(model_name)
base_model = OVModelForCausalLM.from_pretrained(model_name)
- Perform Inference with the OpenVINO Optimized Fine-tuned Intel Virutal Assistant:
message = "What is Intel OpenVINO?"
prompt = f"[INST] {message} [/INST]"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = base_model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True).replace(prompt+" ", "")
print(response)
The OpenVINO IR Format Model performs 56% faster than the original model.
Figure: Performance comparison between the OpenVINO IR Format Model and the original model.
The performance of the model is also evaluated using ROUGE scores:
-> ROUGE-1: 35.23
-> ROUGE-2: 18.97
-> ROUGE-L: 28.82
Demo Video Link: Project Demo
- Harinee J
- Mhanjhusriee Baskar
- Amit Das
- Ojas Patil