Skip to content

Commit

Permalink
Merge pull request #92 from zenml-io/feature/llm-lora-finetuning
Browse files Browse the repository at this point in the history
Feature/llm lora finetuning
  • Loading branch information
schustmi authored Mar 11, 2024
2 parents 6ca7ba5 + f6643bc commit 94a0518
Show file tree
Hide file tree
Showing 61 changed files with 11,532 additions and 0 deletions.
9 changes: 9 additions & 0 deletions llm-lora-finetuning/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
*
!/pipelines/**
!/steps/**
!/materializers/**
!/evaluate/**
!/finetune/**
!/generate/**
!/lit_gpt/**
!/scripts/**
54 changes: 54 additions & 0 deletions llm-lora-finetuning/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# ☮️ Fine-tuning open source LLMs using MLOps pipelines

The goal of this project is to use [ZenML](https://github.com/zenml-io/zenml) to write reusable MLOps pipelines to fine-tune various opens source LLMs.

Using these pipelines, we can run the data-preparation and model finetuning with a single command while using YAML files for [configuration](https://docs.zenml.io/user-guide/production-guide/configure-pipeline) and letting ZenML take care of tracking our metadata and [containerizing our pipelines](https://docs.zenml.io/user-guide/advanced-guide/infrastructure-management/containerize-your-pipeline).

## :earth_americas: Inspiration and Credit

This project heavily relies on the [Lit-GPT project](https://github.com/Lightning-AI/litgpt) of the amazing people at Lightning AI. We used [this blogpost](https://lightning.ai/pages/community/lora-insights/#toc14) to get started with LoRA and QLoRA and modified the commands they recommend to make them work using ZenML.

## 🏃 How to run

In this repository we provide a few predefined configuration files for finetuning the [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) model on the [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) dataset. You can change both the base model and dataset by modifying the configuration files.

If you want to push any of your finetuned adapters or merged models to huggingface, you will need to register a secret with your huggingface access token as follows:
```shell
zenml secret create huggingface_credentials --token=<HUGGINGFACE_TOKEN>
```

### Combined feature engineering and finetuning pipeline

The easiest way to get started with just a single command is to run the finetuning pipeline with the `finetune-mistral-alpaca.yaml` configuration file, which will do both feature engineering and finetuning:

```shell
python run.py --finetuning-pipeline --config finetune-mistral-alpaca.yaml
```

When running the pipeline like this, the trained adapter will be stored in the ZenML artifact store. You can optionally upload the adapter, the merged model or both by specifying the `adapter_output_repo` and `merged_output_repo` parameters in the configuration file.


### Evaluation pipeline

Before running this pipeline, you will need to fill in the `adapter_repo` in the `eval-mistral.yaml` configuration file. This should point to a huggingface repository that contains the finetuned adapter you got by running the finetuning pipeline.

```shell
python run.py --eval-pipeline --config eval-mistral.yaml
```

### Merging pipeline

In case you have trained an adapter using the finetuning pipeline, you can merge it with the base model by filling in the `adapter_repo` and `output_repo` parameters in the `merge-mistral.yaml` file, and then running:

```shell
python run.py --merge-pipeline --config merge-mistral.yaml
```

### Feature Engineering followed by Finetuning

If you want to finetune your model on a different dataset, you can do so by running the feature engineering pipeline followed by the finetuning pipeline. To define your dataset, take a look at the `scripts/prepare_*` scripts and set the dataset name in the `feature-mistral-alpaca.yaml` config file.

```shell
python run.py --feature-pipeline --config --feature-mistral-alpaca.yaml
python run.py --finetuning-pipeline --config finetune-mistral-from-dataset.yaml
```
15 changes: 15 additions & 0 deletions llm-lora-finetuning/configs/eval-mistral.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
model:
name: mistral-7b-lora
description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
tags:
- llm
- lora
- mistral

steps:
evaluate:
parameters:
config:
model_repo: mistralai/Mistral-7B-Instruct-v0.1
adapter_repo: ...
precision: bf16-true
15 changes: 15 additions & 0 deletions llm-lora-finetuning/configs/feature-mistral-alpaca.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
model:
name: mistral-7b-lora
description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
tags:
- llm
- lora
- mistral
- alpaca

steps:
feature_engineering:
parameters:
config:
model_repo: mistralai/Mistral-7B-Instruct-v0.1
dataset_name: alpaca
23 changes: 23 additions & 0 deletions llm-lora-finetuning/configs/finetune-mistral-alpaca.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
model:
name: mistral-7b-lora
description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
tags:
- llm
- lora
- mistral
- alpaca

steps:
finetune:
parameters:
config:
base_model_repo: mistralai/Mistral-7B-Instruct-v0.1
precision: bf16-true
# merged_output_repo:
# adapter_output_repo:
training:
save_interval: 1
epochs: 5
epoch_size: 50000
global_batch_size: 128
learning_rate: 3e-4
21 changes: 21 additions & 0 deletions llm-lora-finetuning/configs/finetune-mistral-from-dataset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
parameters:
dataset_artifact_name: dataset

model:
name: mistral-7b-lora
version: latest

steps:
finetune:
parameters:
config:
base_model_repo: mistralai/Mistral-7B-Instruct-v0.1
precision: bf16-true
# merged_output_repo:
# adapter_output_repo:
training:
save_interval: 1
epochs: 5
epoch_size: 50000
global_batch_size: 128
learning_rate: 3e-4
16 changes: 16 additions & 0 deletions llm-lora-finetuning/configs/merge-mistral.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
model:
name: mistral-7b-lora
description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
tags:
- llm
- lora
- mistral

steps:
merge:
parameters:
config:
base_model_repo: mistralai/Mistral-7B-Instruct-v0.1
adapter_repo: ...
output_repo: ...
precision: bf16-true
231 changes: 231 additions & 0 deletions llm-lora-finetuning/evaluate/lm_eval_harness.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.

import json
import sys
from pathlib import Path
from typing import Dict, List, Literal, Optional

import lightning as L
import torch
from lightning.fabric.plugins import BitsandbytesPrecision
from lm_eval import base, evaluator, tasks
from lm_eval.base import BaseLM

# support running without installing as a package
wd = Path(__file__).parent.parent.resolve()
sys.path.append(str(wd))

from generate.base import generate
from lit_gpt import GPT, Config, Tokenizer
from lit_gpt.utils import (
CLI,
check_valid_checkpoint_dir,
get_default_supported_precision,
load_checkpoint,
)


class EvalHarnessBase(BaseLM):
# Credits:
# https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py
def __init__(
self,
fabric: L.Fabric,
model: GPT,
tokenizer: Tokenizer,
batch_size: int,
):
super().__init__()
self.fabric = fabric
self.model = model
self.tokenizer = tokenizer
self.batch_size_per_gpu = batch_size
with fabric.init_tensor():
model.set_kv_cache(batch_size=batch_size)

@classmethod
def create_from_arg_string(cls, arg_string, additional_config=None):
kwargs = {
el.split("=")[0]: el.split("=")[1] for el in arg_string.split(",")
}
return cls(**kwargs, **additional_config)

@property
def eot_token_id(self):
# we use EOT because end of *text* is more accurate for what we're doing than end of *sentence*
return self.tokenizer.eos_id

@property
def max_length(self):
return self.model.max_seq_length

@property
def vocab_size(self):
return self.tokenizer.vocab_size

@property
def max_gen_toks(self):
return 256

@property
def batch_size(self):
return self.batch_size_per_gpu * self.fabric.world_size

@property
def device(self):
return self.fabric.device

def tok_encode(self, string: str) -> List[int]:
return self.tokenizer.encode(string, bos=False, eos=False).tolist()

def tok_decode(self, tokens: List[int]) -> str:
t = torch.tensor(tokens)
return self.tokenizer.decode(t)

@torch.inference_mode()
def _model_call(self, inps):
return self.model(inps)

@torch.inference_mode()
def _model_generate(
self, context, max_length, eos_token_id
) -> torch.Tensor:
# this only supports batch size 1
assert context.shape[0] == 1
out = generate(self.model, context[0], max_length, eos_id=eos_token_id)
for block in self.model.transformer.h:
block.attn.kv_cache.reset_parameters()
return out.unsqueeze(0)

@torch.inference_mode()
def run_eval(
self,
eval_tasks: List[str],
num_fewshot: int,
limit: Optional[int],
bootstrap_iters: int,
no_cache: bool,
) -> Dict:
# Returns a list containing all values of the task registry that
# match at least one of the patterns
import fnmatch

def pattern_match(patterns, source_list):
task_names = set()
for pattern in patterns:
for matching in fnmatch.filter(source_list, pattern):
task_names.add(matching)
return list(task_names)

eval_tasks = pattern_match(eval_tasks, tasks.ALL_TASKS)
print(f"Found tasks: {eval_tasks}")

# **HACK INCOMING**:
# first get task dict on local main rank
# the tasks are downloaded *as they are initialized*, and the downloads don't like multithreading.
# so we download them once on the local main rank, wait, and then initialize them on all other ranks, which *should* load from the cache.
if self.fabric.local_rank == 0:
tasks.get_task_dict(eval_tasks)
# torch barrier
self.fabric.barrier()
tasks.get_task_dict(eval_tasks)

lm = self
if not no_cache:
lm = base.CachingLM(lm, "lm_cache/lit-gpt.db")

results = evaluator.evaluate(
lm=lm,
task_dict=tasks.get_task_dict(eval_tasks),
num_fewshot=num_fewshot,
limit=limit,
bootstrap_iters=bootstrap_iters,
)
results["config"] = dict(
model=self.model.config.name,
batch_size=self.batch_size,
device=str(self.device),
num_fewshot=num_fewshot,
limit=limit,
bootstrap_iters=bootstrap_iters,
no_cache=no_cache,
)
return results


@torch.inference_mode()
def run_eval_harness(
checkpoint_dir: Path,
precision: Optional[str] = None,
quantize: Optional[
Literal["bnb.nf4", "bnb.nf4-dq", "bnb.fp4", "bnb.fp4-dq", "bnb.int8"]
] = None,
eval_tasks: List[str] = [
"arc_challenge",
"piqa",
"hellaswag",
"hendrycksTest-*",
],
save_filepath: Optional[Path] = None,
num_fewshot: int = 0,
limit: Optional[int] = None,
bootstrap_iters: int = 100000,
no_cache: bool = True,
):
if precision is None:
precision = get_default_supported_precision(training=False)

plugins = None
if quantize is not None and quantize.startswith("bnb."):
if "mixed" in precision:
raise ValueError(
"Quantization and mixed precision is not supported."
)
dtype = {
"16-true": torch.float16,
"bf16-true": torch.bfloat16,
"32-true": torch.float32,
}[precision]
plugins = BitsandbytesPrecision(quantize[4:], dtype)
precision = None

fabric = L.Fabric(devices=1, precision=precision, plugins=plugins)

check_valid_checkpoint_dir(checkpoint_dir)
tokenizer = Tokenizer(checkpoint_dir)

config = Config.from_json(checkpoint_dir / "lit_config.json")

checkpoint_path = checkpoint_dir / "lit_model.pth"

print(
f"Loading model {str(checkpoint_path)!r} with {config.__dict__}",
file=sys.stderr,
)
with fabric.init_module(empty_init=True):
model = GPT(config)

model.eval()
model = fabric.setup_module(model)

load_checkpoint(fabric, model, checkpoint_path)

eval_harness = EvalHarnessBase(fabric, model, tokenizer, 1)

results = eval_harness.run_eval(
eval_tasks, num_fewshot, limit, bootstrap_iters, no_cache
)
if save_filepath is None:
print(results)
else:
print(f"Saving results to {str(save_filepath)!r}")
save_filepath.parent.mkdir(parents=True, exist_ok=True)
data = json.dumps(results)
with open(save_filepath, "w") as fw:
fw.write(data)


if __name__ == "__main__":
torch.set_float32_matmul_precision("high")

CLI(run_eval_harness)
Loading

0 comments on commit 94a0518

Please sign in to comment.