Merge pull request #92 from zenml-io/feature/llm-lora-finetuning

Feature/llm lora finetuning
zenml-io · Mar 11, 2024 · 94a0518 · 94a0518
2 parents 6ca7ba5 + f6643bc
commit 94a0518
Show file tree

Hide file tree

Showing 61 changed files with 11,532 additions and 0 deletions.
diff --git a/llm-lora-finetuning/.dockerignore b/llm-lora-finetuning/.dockerignore
@@ -0,0 +1,9 @@
+*
+!/pipelines/**
+!/steps/**
+!/materializers/**
+!/evaluate/**
+!/finetune/**
+!/generate/**
+!/lit_gpt/**
+!/scripts/**
diff --git a/llm-lora-finetuning/README.md b/llm-lora-finetuning/README.md
@@ -0,0 +1,54 @@
+# ☮️ Fine-tuning open source LLMs using MLOps pipelines
+
+The goal of this project is to use [ZenML](https://github.com/zenml-io/zenml) to write reusable MLOps pipelines to fine-tune various opens source LLMs.
+
+Using these pipelines, we can run the data-preparation and model finetuning with a single command while using YAML files for [configuration](https://docs.zenml.io/user-guide/production-guide/configure-pipeline) and letting ZenML take care of tracking our metadata and [containerizing our pipelines](https://docs.zenml.io/user-guide/advanced-guide/infrastructure-management/containerize-your-pipeline).
+
+## :earth_americas: Inspiration and Credit
+
+This project heavily relies on the [Lit-GPT project](https://github.com/Lightning-AI/litgpt) of the amazing people at Lightning AI. We used [this blogpost](https://lightning.ai/pages/community/lora-insights/#toc14) to get started with LoRA and QLoRA and modified the commands they recommend to make them work using ZenML.
+
+## 🏃 How to run
+
+In this repository we provide a few predefined configuration files for finetuning the [Mistral-7B-Instruct-v0.1](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1) model on the [Alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) dataset. You can change both the base model and dataset by modifying the configuration files.
+
+If you want to push any of your finetuned adapters or merged models to huggingface, you will need to register a secret with your huggingface access token as follows:
+```shell
+zenml secret create huggingface_credentials --token=<HUGGINGFACE_TOKEN>
+```
+
+### Combined feature engineering and finetuning pipeline
+
+The easiest way to get started with just a single command is to run the finetuning pipeline with the `finetune-mistral-alpaca.yaml` configuration file, which will do both feature engineering and finetuning:
+
+```shell
+python run.py --finetuning-pipeline --config finetune-mistral-alpaca.yaml
+```
+
+When running the pipeline like this, the trained adapter will be stored in the ZenML artifact store. You can optionally upload the adapter, the merged model or both by specifying the `adapter_output_repo` and `merged_output_repo` parameters in the configuration file.
+
+
+### Evaluation pipeline
+
+Before running this pipeline, you will need to fill in the `adapter_repo` in the `eval-mistral.yaml` configuration file. This should point to a huggingface repository that contains the finetuned adapter you got by running the finetuning pipeline.
+
+```shell
+python run.py --eval-pipeline --config eval-mistral.yaml
+```
+
+### Merging pipeline
+
+In case you have trained an adapter using the finetuning pipeline, you can merge it with the base model by filling in the `adapter_repo` and `output_repo` parameters in the `merge-mistral.yaml` file, and then running:
+
+```shell
+python run.py --merge-pipeline --config merge-mistral.yaml
+```
+
+### Feature Engineering followed by Finetuning
+
+If you want to finetune your model on a different dataset, you can do so by running the feature engineering pipeline followed by the finetuning pipeline. To define your dataset, take a look at the `scripts/prepare_*` scripts and set the dataset name in the `feature-mistral-alpaca.yaml` config file.
+
+```shell
+python run.py --feature-pipeline --config --feature-mistral-alpaca.yaml
+python run.py --finetuning-pipeline --config finetune-mistral-from-dataset.yaml
+```
diff --git a/llm-lora-finetuning/configs/eval-mistral.yaml b/llm-lora-finetuning/configs/eval-mistral.yaml
@@ -0,0 +1,15 @@
+model:
+  name: mistral-7b-lora
+  description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
+  tags:
+    - llm
+    - lora
+    - mistral
+
+steps:
+  evaluate:
+    parameters:
+      config:
+        model_repo: mistralai/Mistral-7B-Instruct-v0.1
+        adapter_repo: ...
+        precision: bf16-true
diff --git a/llm-lora-finetuning/configs/feature-mistral-alpaca.yaml b/llm-lora-finetuning/configs/feature-mistral-alpaca.yaml
@@ -0,0 +1,15 @@
+model:
+  name: mistral-7b-lora
+  description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
+  tags:
+    - llm
+    - lora
+    - mistral
+    - alpaca
+
+steps:
+  feature_engineering:
+    parameters:
+      config:
+        model_repo: mistralai/Mistral-7B-Instruct-v0.1
+        dataset_name: alpaca
diff --git a/llm-lora-finetuning/configs/finetune-mistral-alpaca.yaml b/llm-lora-finetuning/configs/finetune-mistral-alpaca.yaml
@@ -0,0 +1,23 @@
+model:
+  name: mistral-7b-lora
+  description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
+  tags:
+    - llm
+    - lora
+    - mistral
+    - alpaca
+
+steps:
+  finetune:
+    parameters:
+      config:
+        base_model_repo: mistralai/Mistral-7B-Instruct-v0.1
+        precision: bf16-true
+        # merged_output_repo:
+        # adapter_output_repo:
+        training:
+          save_interval: 1
+          epochs: 5
+          epoch_size: 50000
+          global_batch_size: 128
+          learning_rate: 3e-4
diff --git a/llm-lora-finetuning/configs/finetune-mistral-from-dataset.yaml b/llm-lora-finetuning/configs/finetune-mistral-from-dataset.yaml
@@ -0,0 +1,21 @@
+parameters:
+  dataset_artifact_name: dataset
+
+model:
+  name: mistral-7b-lora
+  version: latest
+
+steps:
+  finetune:
+    parameters:
+      config:
+        base_model_repo: mistralai/Mistral-7B-Instruct-v0.1
+        precision: bf16-true
+        # merged_output_repo:
+        # adapter_output_repo:
+        training:
+          save_interval: 1
+          epochs: 5
+          epoch_size: 50000
+          global_batch_size: 128
+          learning_rate: 3e-4
diff --git a/llm-lora-finetuning/configs/merge-mistral.yaml b/llm-lora-finetuning/configs/merge-mistral.yaml
@@ -0,0 +1,16 @@
+model:
+  name: mistral-7b-lora
+  description: "Fine-tune `mistralai/Mistral-7B-Instruct-v0.1`."
+  tags:
+    - llm
+    - lora
+    - mistral
+
+steps:
+  merge:
+    parameters:
+      config:
+        base_model_repo: mistralai/Mistral-7B-Instruct-v0.1
+        adapter_repo: ...
+        output_repo: ...
+        precision: bf16-true
diff --git a/llm-lora-finetuning/evaluate/lm_eval_harness.py b/llm-lora-finetuning/evaluate/lm_eval_harness.py
@@ -0,0 +1,231 @@
+# Copyright Lightning AI. Licensed under the Apache License 2.0, see LICENSE file.
+
+import json
+import sys
+from pathlib import Path
+from typing import Dict, List, Literal, Optional
+
+import lightning as L
+import torch
+from lightning.fabric.plugins import BitsandbytesPrecision
+from lm_eval import base, evaluator, tasks
+from lm_eval.base import BaseLM
+
+# support running without installing as a package
+wd = Path(__file__).parent.parent.resolve()
+sys.path.append(str(wd))
+
+from generate.base import generate
+from lit_gpt import GPT, Config, Tokenizer
+from lit_gpt.utils import (
+    CLI,
+    check_valid_checkpoint_dir,
+    get_default_supported_precision,
+    load_checkpoint,
+)
+
+
+class EvalHarnessBase(BaseLM):
+    # Credits:
+    # https://github.com/EleutherAI/gpt-neox/blob/main/eval_tasks/eval_adapter.py
+    def __init__(
+        self,
+        fabric: L.Fabric,
+        model: GPT,
+        tokenizer: Tokenizer,
+        batch_size: int,
+    ):
+        super().__init__()
+        self.fabric = fabric
+        self.model = model
+        self.tokenizer = tokenizer
+        self.batch_size_per_gpu = batch_size
+        with fabric.init_tensor():
+            model.set_kv_cache(batch_size=batch_size)
+
+    @classmethod
+    def create_from_arg_string(cls, arg_string, additional_config=None):
+        kwargs = {
+            el.split("=")[0]: el.split("=")[1] for el in arg_string.split(",")
+        }
+        return cls(**kwargs, **additional_config)
+
+    @property
+    def eot_token_id(self):
+        # we use EOT because end of *text* is more accurate for what we're doing than end of *sentence*
+        return self.tokenizer.eos_id
+
+    @property
+    def max_length(self):
+        return self.model.max_seq_length
+
+    @property
+    def vocab_size(self):
+        return self.tokenizer.vocab_size
+
+    @property
+    def max_gen_toks(self):
+        return 256
+
+    @property
+    def batch_size(self):
+        return self.batch_size_per_gpu * self.fabric.world_size
+
+    @property
+    def device(self):
+        return self.fabric.device
+
+    def tok_encode(self, string: str) -> List[int]:
+        return self.tokenizer.encode(string, bos=False, eos=False).tolist()
+
+    def tok_decode(self, tokens: List[int]) -> str:
+        t = torch.tensor(tokens)
+        return self.tokenizer.decode(t)
+
+    @torch.inference_mode()
+    def _model_call(self, inps):
+        return self.model(inps)
+
+    @torch.inference_mode()
+    def _model_generate(
+        self, context, max_length, eos_token_id
+    ) -> torch.Tensor:
+        # this only supports batch size 1
+        assert context.shape[0] == 1
+        out = generate(self.model, context[0], max_length, eos_id=eos_token_id)
+        for block in self.model.transformer.h:
+            block.attn.kv_cache.reset_parameters()
+        return out.unsqueeze(0)
+
+    @torch.inference_mode()
+    def run_eval(
+        self,
+        eval_tasks: List[str],
+        num_fewshot: int,
+        limit: Optional[int],
+        bootstrap_iters: int,
+        no_cache: bool,
+    ) -> Dict:
+        # Returns a list containing all values of the task registry that
+        # match at least one of the patterns
+        import fnmatch
+
+        def pattern_match(patterns, source_list):
+            task_names = set()
+            for pattern in patterns:
+                for matching in fnmatch.filter(source_list, pattern):
+                    task_names.add(matching)
+            return list(task_names)
+
+        eval_tasks = pattern_match(eval_tasks, tasks.ALL_TASKS)
+        print(f"Found tasks: {eval_tasks}")
+
+        # **HACK INCOMING**:
+        # first get task dict on local main rank
+        # the tasks are downloaded *as they are initialized*, and the downloads don't like multithreading.
+        # so we download them once on the local main rank, wait, and then initialize them on all other ranks, which *should* load from the cache.
+        if self.fabric.local_rank == 0:
+            tasks.get_task_dict(eval_tasks)
+        # torch barrier
+        self.fabric.barrier()
+        tasks.get_task_dict(eval_tasks)
+
+        lm = self
+        if not no_cache:
+            lm = base.CachingLM(lm, "lm_cache/lit-gpt.db")
+
+        results = evaluator.evaluate(
+            lm=lm,
+            task_dict=tasks.get_task_dict(eval_tasks),
+            num_fewshot=num_fewshot,
+            limit=limit,
+            bootstrap_iters=bootstrap_iters,
+        )
+        results["config"] = dict(
+            model=self.model.config.name,
+            batch_size=self.batch_size,
+            device=str(self.device),
+            num_fewshot=num_fewshot,
+            limit=limit,
+            bootstrap_iters=bootstrap_iters,
+            no_cache=no_cache,
+        )
+        return results
+
+
+@torch.inference_mode()
+def run_eval_harness(
+    checkpoint_dir: Path,
+    precision: Optional[str] = None,
+    quantize: Optional[
+        Literal["bnb.nf4", "bnb.nf4-dq", "bnb.fp4", "bnb.fp4-dq", "bnb.int8"]
+    ] = None,
+    eval_tasks: List[str] = [
+        "arc_challenge",
+        "piqa",
+        "hellaswag",
+        "hendrycksTest-*",
+    ],
+    save_filepath: Optional[Path] = None,
+    num_fewshot: int = 0,
+    limit: Optional[int] = None,
+    bootstrap_iters: int = 100000,
+    no_cache: bool = True,
+):
+    if precision is None:
+        precision = get_default_supported_precision(training=False)
+
+    plugins = None
+    if quantize is not None and quantize.startswith("bnb."):
+        if "mixed" in precision:
+            raise ValueError(
+                "Quantization and mixed precision is not supported."
+            )
+        dtype = {
+            "16-true": torch.float16,
+            "bf16-true": torch.bfloat16,
+            "32-true": torch.float32,
+        }[precision]
+        plugins = BitsandbytesPrecision(quantize[4:], dtype)
+        precision = None
+
+    fabric = L.Fabric(devices=1, precision=precision, plugins=plugins)
+
+    check_valid_checkpoint_dir(checkpoint_dir)
+    tokenizer = Tokenizer(checkpoint_dir)
+
+    config = Config.from_json(checkpoint_dir / "lit_config.json")
+
+    checkpoint_path = checkpoint_dir / "lit_model.pth"
+
+    print(
+        f"Loading model {str(checkpoint_path)!r} with {config.__dict__}",
+        file=sys.stderr,
+    )
+    with fabric.init_module(empty_init=True):
+        model = GPT(config)
+
+    model.eval()
+    model = fabric.setup_module(model)
+
+    load_checkpoint(fabric, model, checkpoint_path)
+
+    eval_harness = EvalHarnessBase(fabric, model, tokenizer, 1)
+
+    results = eval_harness.run_eval(
+        eval_tasks, num_fewshot, limit, bootstrap_iters, no_cache
+    )
+    if save_filepath is None:
+        print(results)
+    else:
+        print(f"Saving results to {str(save_filepath)!r}")
+        save_filepath.parent.mkdir(parents=True, exist_ok=True)
+        data = json.dumps(results)
+        with open(save_filepath, "w") as fw:
+            fw.write(data)
+
+
+if __name__ == "__main__":
+    torch.set_float32_matmul_precision("high")
+
+    CLI(run_eval_harness)