Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md last update date #63

Merged
merged 2 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .github/workflows/update_readme.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
name: Update README

on:
push:
branches: ["main"]
nsosio marked this conversation as resolved.
Show resolved Hide resolved
paths:
- README.md.template

jobs:
update-readme:
runs-on: ubuntu-latest
steps:
- name: Checkout Code Repository
uses: actions/checkout@v3

- name: Update README
run: sed "s|<LAST_UPDATE>|$(date -u +"%dth %B %Y")|g" README.md.template > README.md

- name: Commit changes
run: |
git config --global user.email "[email protected]"
git config --global user.name "GitHub Actions"
git add README.md
git commit -m "Update <LAST_UPDATE> placeholder in README.md" || true

- name: Push changes
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
118 changes: 118 additions & 0 deletions README.md.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# benchmarks
MLOps Engines, Frameworks, and Languages benchmarks over main stream AI Models.

## Tool

The benchmarking tool comprises three main scripts:
- `benchmark.sh` for running the end-to-end benchmarking
- `download.sh` which is internally used by the benchmark script to download the needed model files based on a configuration
- `setup.sh` script for setup of dependencies and needed formats conversion

### benchmark

This script runs benchmarks for a transformer model using both Rust and Python implementations. It provides options to customize the benchmarks, such as the prompt, repetitions, maximum tokens, device, and NVIDIA flag.

```bash
./benchmark.sh [OPTIONS]
```
where `OPTIONS`:
- `-p, --prompt`: Prompt for benchmarks (default: 'Explain what is a transformer')
- `-r, --repetitions`: Number of repetitions for benchmarks (default: 2)
- `-m, --max_tokens`: Maximum number of tokens for benchmarks (default: 100)
- `-d, --device`: Device for benchmarks (possible values: 'gpu' or 'cpu', default: 'cpu')
- `--nvidia`: Use NVIDIA for benchmarks (default: false)

### download

Downloads files from a list of URLs specified in a JSON file. The JSON file should contain an array of objects, each with a 'url', 'file', and 'folder' property. The script checks if the file already exists before downloading it.

```bash
./download.sh --models <json_file> --cache <cache_file> --force-download
```
Options
- `--models`: JSON file specifying the models to download (default: models.json)
- `--cache`: Cache file to keep track of downloaded files (default: cache.log)
- `--force-download`: Force download of all files, removing existing files and cache

### setup
1. Creates a python virtual environment `venv` and installs project requirements.
3. Converts and stores models in different formats.

```bash
./setup.sh
```

## ML Engines: Feature Table

| Features | pytorch | burn | llama.cpp | candle | tinygrad | onnxruntime | CTranslate2 |
| --------------------------- | ------- | ---- | --------- | ------ | -------- | ----------- | ----------- |
| Inference support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| 16-bit quantization support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| 8-bit quantization support | ✅ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ |
| 4-bit quantization support | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ |
| 2/3bit quantization support | ✅ | ❌ | ✅ | ✅ | ❌ | ❌ | ❌ |
| CUDA support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ROCM support | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| Intel OneAPI/SYCL support | ✅** | ✅ | ✅ | ✅ | ✅ | ❌ | ❌ |
| Mac M1/M2 support | ✅ | ✅ | ✅ | ⭐ | ✅ | ✅ | ⭐ |
| BLAS support(CPU) | ✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ |
| Model Parallel support | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
| Tensor Parallel support | ✅ | ❌ | ❌ | ✅ | ❌ | ❌ | ✅ |
| Onnx Format support | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ❌ |
| Training support | ✅ | 🌟 | ❌ | 🌟 | ❌ | ❌ | ❌ |

⭐ = No Metal Support
🌟 = Partial Support for Training (Finetuning already works, but training from scratch may not work)

## Benchmarking ML Engines

### A100 80GB Inference Bench:

Model: LLAMA-2-7B

CUDA Version: 11.7

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --nvidia --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|--------------|--------------|--------------|
| burn | 13.28 ± 0.79 | - | - | - |
| candle | - | 26.30 ± 0.29 | - | - |
| llama.cpp | - | - | 67.64 ± 22.57| 106.21 ± 2.21|
| ctranslate | - | 58.54 ± 13.24| 34.22 ± 6.29 | - |
| tinygrad | - | 20.13 ± 1.35 | - | - |

*(data updated: <LAST_UPDATE>)


### M2 MAX 32GB Inference Bench:

#### CPU

Model: LLAMA-2-7B

CUDA Version: NA

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device cpu --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|--------------|--------------|--------------|
| burn | 0.30 ± 0.09 | - | - | - |
| candle | - | 3.43 ± 0.02 | - | - |
| llama.cpp | - | - | 14.41 ± 1.59 | 20.96 ± 1.94 |
| ctranslate | - | - | 2.11 ± 0.73 | - |
| tinygrad | - | 4.21 ± 0.38 | - | - |

#### GPU (Metal)

Command: `./benchmark.sh --repetitions 10 --max_tokens 100 --device gpu --prompt 'Explain what is a transformer'`

| Engine | float32 | float16 | int8 | int4 |
|-------------|--------------|--------------|--------------|--------------|
| burn | - | - | - | - |
| candle | - | - | - | - |
| llama.cpp | - | - | 31.24 ± 7.82 | 46.75 ± 9.55 |
| ctranslate | - | - | - | - |
| tinygrad | - | 29.78 ± 1.18 | - | - |

*(data updated: <LAST_UPDATE>)