Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of CTransformers for benchmarks. #70

Merged
merged 28 commits into from
Dec 2, 2023
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
fd03e39
Feat: Adding the integration for CTransformers for benchmarks.
Nov 24, 2023
acec0f6
Adding sh file to run the benchmarks for CTransformers
Nov 24, 2023
88d4ef8
adding requirements to install dependencies for ctransformers
Nov 24, 2023
2f65ffa
Refactor: Bench CTransformers by removing model_type.
Nov 27, 2023
73ccb4f
Added benchmark bash script to run benchmarking.
Nov 27, 2023
995441b
Added a requirements file for installing CTransformers
Nov 27, 2023
93cbfc8
Removing setup.sh for device
Nov 27, 2023
8fcfa94
Added numpy in ewquirements.txt
Nov 27, 2023
d4468ee
added custom dependency installation in benchmark script for cuda
Nov 27, 2023
7d19c23
fix: in time calculation, token length count is excluded.
Anindyadeep Nov 29, 2023
c949737
fix: typo
Anindyadeep Nov 29, 2023
271d392
Added ctransformers benchmark results for A100 and CPU
Nov 29, 2023
99246ed
added latest benchmark info for ctransformers, m2(cpu, gpu), a100.
Nov 29, 2023
ebd3ba4
Merge pull request #1 from premAI-io/main
Anindyadeep Dec 1, 2023
c8c861b
Update <LAST_UPDATE> placeholder in llama2.md
actions-user Dec 1, 2023
722422c
Merge branch 'main' of https://github.com/Anindyadeep/benchmarks
Dec 1, 2023
50ca40e
merge from main
Dec 1, 2023
964e2cd
revert default docs to latest changes in main
Dec 1, 2023
2147899
Merge pull request #2 from premAI-io/main
Anindyadeep Dec 1, 2023
db6c91c
Merge branch 'main' of https://github.com/Anindyadeep/benchmarks
Dec 1, 2023
09fa593
Merge branch 'main' into anindya/ctransformers
Dec 1, 2023
c651694
added setup.sh file for installing dependencies for ctransformers
Dec 1, 2023
ae02965
Refactor: bencharks bash file.
Dec 1, 2023
8ebfd59
removed ctransformers in requirements file
Dec 1, 2023
de40909
added ctransformers results inside llama2.md.template
Dec 1, 2023
99707b3
fix: quite installation for metal devices
Dec 1, 2023
4cf5787
fix: syntax
Anindyadeep Dec 2, 2023
0f332b9
Refactor: setup sctipt
Dec 2, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 112 additions & 0 deletions bench_ctransformers/bench.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
import argparse
import logging
import sys
import time
from collections import defaultdict
from typing import Optional

import numpy as np
from ctransformers import AutoModelForCausalLM

logging.getLogger("ctransformers").setLevel(logging.ERROR)
logging.basicConfig(
stream=sys.stdout,
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
)


class LlamaCTransformersBenchmark:
def __init__(
self,
model_path: str,
device: Optional[str] = "cpu",
) -> None:
self.model_path, self.device = model_path, device
self.results = []
self.device = device

def load_model(self):
# FIXME: Not sure how to get num layers for each model to know how many to fit into VRAM.
self.model = AutoModelForCausalLM.from_pretrained(
self.model_path,
model_type="llama",
gpu_layers=50 if self.device in ["cuda", "metal"] else 0,
)
return self

def run_model(self, prompt: str, max_tokens: int) -> float:
start = time.time()
output = self.model(prompt, max_new_tokens=max_tokens)
delta = time.time() - start
tokens = len(self.model.tokenize(output))
return tokens / delta

def benchmark(self, prompt: str, max_tokens: int, repetitions: int) -> None:
for i in range(repetitions):
logging.info(
f"Running repetition [{str(i+1).zfill(len(str(repetitions)))}/{repetitions}]"
)
tokens_per_second = self.run_model(prompt, max_tokens)
self.results.append(tokens_per_second)


if __name__ == "__main__":
parser = argparse.ArgumentParser(description="CTransformers Benchmark.")
parser.add_argument(
"--prompt",
type=str,
help="The prompt for the model.",
)
parser.add_argument("--max_tokens", type=int, help="The maximum number of tokens.")
parser.add_argument(
"--repetitions",
type=int,
help="The number of repetitions for the benchmark.",
)
parser.add_argument(
"--device",
help="Device to use for the benchmark.",
)
parser.add_argument(
"--log_file",
type=str,
help="Path to the log file for writing logs (in append mode).",
)
parser.add_argument(
"--models_dir",
type=str,
help="Path to the models directory.",
)
args = parser.parse_args()
logging.info(
f"Running benchmark with: max_tokens={args.max_tokens} prompt={args.prompt} "
+ f"repetitions={args.repetitions} device={args.device}"
)
report = defaultdict(lambda: defaultdict(float))
for quantize in ("Q8_0", "Q4_0"):
logging.info(f"Running CTransformer benchmark on Llama with {quantize}")
llama_ctransformers_bench = LlamaCTransformersBenchmark(
f"{args.models_dir}/llama-2-7b-gguf/llama-2-7b.{quantize}.gguf",
device=args.device,
).load_model()
llama_ctransformers_bench.benchmark(
max_tokens=args.max_tokens, prompt=args.prompt, repetitions=args.repetitions
)
q = "int8" if quantize == "Q8_0" else "int4"
report["llama_ctransformers"][q] = {
"mean": np.mean(llama_ctransformers_bench.results),
"std": np.std(llama_ctransformers_bench.results),
}

logging.info("Benchmark report")
with open(args.log_file, "a") as file:
for framework, quantizations in report.items():
for quantization, stats in quantizations.items():
logging.info(
f"{framework}, {quantization}: {stats['mean']:.2f} ± {stats['std']:.2f}"
)
print(
f"{framework}, {quantization}: {stats['mean']:.2f} ± {stats['std']:.2f}",
file=file,
)
148 changes: 148 additions & 0 deletions bench_ctransformers/bench.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
#!/bin/bash

########################################################################################################
# Script: bench.sh
# Description: This script runs benchmarks ctransformers llama benchmark.
#
# Usage: ./bench.sh [OPTIONS]
# OPTIONS:
# -p, --prompt Prompt for benchmarks (default: 'Explain what is a transformer')
# -r, --repetitions Number of repetitions for benchmarks (default: 2)
# -m, --max_tokens Maximum number of tokens for benchmarks (default: 100)
# -d, --device Device for benchmarks (possible values: 'metal', 'gpu', and 'cpu', default: 'cpu')
# -lf, --log_file Logging file name.
# -md, --models_dir Models directory.
# -h, --help Show this help message
########################################################################################################

set -euo pipefail

SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

print_usage() {
echo "Usage: $0 [OPTIONS]"
echo "OPTIONS:"
echo " -p, --prompt Prompt for benchmarks (default: 'Explain what is a transformer')"
echo " -r, --repetitions Number of repetitions for benchmarks (default: 2)"
echo " -m, --max_tokens Maximum number of tokens for benchmarks (default: 100)"
echo " -d, --device Device for benchmarks (possible values: 'metal', 'gpu', and 'cpu', default: 'cpu')"
echo " -lf, --log_file Logging file name."
echo " -md, --models_dir Models directory."
echo " -h, --help Show this help message"
exit 1
}

check_cuda() {
if command -v nvcc &> /dev/null
then
echo -e "\nUsing CUDA"
nvcc --version
pip install ctransformers[cuda] numpy
nsosio marked this conversation as resolved.
Show resolved Hide resolved
else
echo -e "\nCUDA is not available."
exit 1
fi
}

check_platform() {
local platform
platform=$(uname -s)
if [[ "$platform" == "Linux" ]]; then
echo "Running on Linux."
pip install -r requirements.txt
nsosio marked this conversation as resolved.
Show resolved Hide resolved
elif [[ "$platform" == "Darwin" ]]; then
echo "Running on Mac OS."
echo "Installing CTransformers on metal"
export CT_METAL=1
pip install ctransformers --no-binary ctransformers
nsosio marked this conversation as resolved.
Show resolved Hide resolved
else
echo "Unknown platform."
exit 1
fi
}

check_python() {
if command -v python &> /dev/null
then
echo -e "\nUsing $(python --version)."
else
echo -e "\nPython does not exist."
exit 1
fi
}

run_benchmarks() {
local PROMPT="$1"
local REPETITIONS="$2"
local MAX_TOKENS="$3"
local DEVICE="$4"
local LOG_FILENAME="$5"
local MODELS_DIR="$6"

python "$SCRIPT_DIR"/bench.py \
--prompt "$PROMPT" \
--repetitions "$REPETITIONS" \
--max_tokens "$MAX_TOKENS" \
--log_file "$LOG_FILENAME" \
--models_dir "$MODELS_DIR" \
--device "$DEVICE"
}

# Parse command-line arguments
while [ "$#" -gt 0 ]; do
case "$1" in
-p|--prompt)
PROMPT="$2"
shift 2
;;
-r|--repetitions)
REPETITIONS="$2"
shift 2
;;
-m|--max_tokens)
MAX_TOKENS="$2"
shift 2
;;
-d|--device)
DEVICE="$2"
case "$DEVICE" in
"cuda" | "metal" | "cpu")
;;
*)
echo "Invalid value for --device. Please use 'cuda', 'gpu' or 'cpu'."
print_usage
;;
esac
if [ "$DEVICE" == "cuda" ]; then
check_cuda
fi
shift 2
;;
-lf|--log_file)
LOG_FILENAME="$2"
shift 2
;;
-md|--models_dir)
MODELS_DIR="$2"
shift 2
;;
-h|--help)
print_usage
;;
*)
echo "Unknown option: $1"
print_usage
;;
esac
done
# Set default values if not provided
PROMPT="${PROMPT:-"Explain what is a transformer"}"
REPETITIONS="${REPETITIONS:-10}"
MAX_TOKENS="${MAX_TOKENS:-100}"
DEVICE="${DEVICE:-'cpu'}"
LOG_FILENAME="${LOG_FILENAME:-"benchmark_$(date +'%Y%m%d%H%M%S').log"}"
MODELS_DIR="${MODELS_DIR:-"./models"}"

check_platform
check_python
run_benchmarks "$PROMPT" "$REPETITIONS" "$MAX_TOKENS" "$DEVICE" "$LOG_FILENAME" "$MODELS_DIR"
2 changes: 2 additions & 0 deletions bench_ctransformers/requirements.txt
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin at least the major versions

Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
ctransformers
numpy
4 changes: 2 additions & 2 deletions docs/llama2.md
nsosio marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
| tinygrad | - | 20.32 ± 0.06 | - | - |
| onnx | - | 54.16 ± 3.15 | - | - |

*(Data updated: `30th November 2023`)
*(Data updated: `01th December 2023`)


## M2 MAX 32GB Inference Bench:
Expand Down Expand Up @@ -53,4 +53,4 @@
| tinygrad | - | 29.78 ± 1.18 | - | - |
| onnx | - | - | - | - |

*(Data updated: `30th November 2023`)
*(Data updated: `01th December 2023`)