Skip to content

Commit

Permalink
changes made per discussion
Browse files Browse the repository at this point in the history
  • Loading branch information
samuel100 committed Oct 30, 2024
1 parent e608d77 commit e6f45d2
Showing 1 changed file with 43 additions and 29 deletions.
72 changes: 43 additions & 29 deletions src/routes/blogs/olive-shared-cache/+page.svx
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ authorsLink:
'https://www.linkedin.com/in/devangpatel/',
'https://www.linkedin.com/in/samuel-kemp-a9253724/'
]
image: 'https://iili.io/d95Pwcx.png'
imageSquare: 'https://iili.io/d95Pwcx.png'
image: 'https://artwork.lfaidata.foundation/projects/onnx/stacked/color/onnx-stacked-color.png'
imageSquare: 'https://artwork.lfaidata.foundation/projects/onnx/stacked/color/onnx-stacked-color.png'
url: 'https://onnxruntime.ai/blogs/olive-shared-cache'
---

Expand All @@ -29,19 +29,26 @@ Efficiency in machine learning not only relies on the effectiveness of algorithm

This blog post delves into how Olive’s shared cache feature can help you save time and costs, illustrated with practical examples.

### Prerequisites

- An Azure Storage Account. For details on how to create an Azure Storage Account, read [Create an Azure Storage Account](https://learn.microsoft.com/azure/storage/common/storage-account-create?tabs=azure-portal).
- Once you have created your Azure Storage Account, you'll need to create a storage container (a container organizes a set of blobs, similar to a directory in a file system). For more details on how to create a storage container, read [Create a container](https://learn.microsoft.com/azure/storage/blobs/blob-containers-portal#create-a-container).

## 🤝 Team collaboration during optimization process

User A begins the optimization process by employing Olive’s quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model using the AWQ algorithm. This step is marked by the following command line execution:

```bash
olive quantize \
--model Microsoft/Phi-3-mini-4k-instruct \
--algorithm awq \
--account-name {AZURE_STORAGE_ACCOUNT} \
--container-name {STORAGE_CONTAINER_NAME} \
--log_level 1
```
<pre><code>olive quantize \
--model_name_or_path Microsoft/Phi-3-mini-4k-instruct \
--algorithm awq \
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
--log_level 1
</code></pre>

> **Note:**
> - The `--account_name` should be set to your Azure Storage Account name.
> - The `--container_name` should be set to the container name in the Azure Storage Account.

The optimization process generates a log that confirms the cache has been saved in a shared location in Azure:

Expand All @@ -58,14 +65,13 @@ This shared cache is a pivotal element, as it stores the optimized model, making

User B, another active team member in the optimization project, reaps the benefits of User A’s efforts. By using the same quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) with the AWQ algorithm, User B’s process is significantly expedited. The command is identical, and User B leverages the same Azure Storage account and container:

```bash
olive quantize \
--model Microsoft/Phi-3-mini-4k-instruct \
--algorithm awq \
--account-name {AZURE_STORAGE_ACCOUNT} \
--container-name {STORAGE_CONTAINER_NAME} \
--log_level 1
```
<pre><code>olive quantize \
--model_name_or_path Microsoft/Phi-3-mini-4k-instruct \
--algorithm awq \
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
--log_level 1
</code></pre>

A critical part of this step is the following log output highlights the retrieval of the quantized model from the shared cache rather than re-computing the AWQ quantization.

Expand All @@ -90,33 +96,31 @@ Optimization is not limited to quantization alone. Olive’s Automatic optimizer

User A leverages Automatic optimizer to optimize the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) for CPU. The command line instruction for this task is:

```bash
olive auto-opt \
<pre><code>olive auto-opt \
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
--trust_remote_code \
--output_path optimized-model \
--device cpu \
--provider CPUExecutionProvider \
--precision int4 \
--account-name {AZURE_STORAGE_ACCOUNT} \
--container-name {STORAGE_CONTAINER_NAME} \
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
--log_level 1
```
</code></pre>

For each task executed in the automatic optimizer - for example, model download, ONNX Conversion, ONNX graph optimization, Quantization, etc - the intermediate model will be stored in the shared cache for reuse on different hardware targets. For example, if later User B wants to optimize the same model for a different target (say, the GPU of a Windows device) they would execute the following command:

```bash
olive auto-opt \
<pre><code>olive auto-opt \
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
--trust_remote_code \
--output_path optimized-model \
--device gpu \
--provider DmlExecutionProvider \
--precision int4 \
--account-name {AZURE_STORAGE_ACCOUNT} \
--container-name {STORAGE_CONTAINER_NAME} \
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
--log_level 1
```
</code></pre>

The common intermediate steps User A's CPU optimization - such as ONNX conversion and ONNX graph optimization - will be reused, which will save User B time and cost.

Expand Down Expand Up @@ -145,4 +149,14 @@ To try the quantization and Auto Optimizer commands with the shared-cache featur
pip install olive-ai[auto-opt,shared-cache] autoawq
```

> **Note:** Quantizing a model using the AWQ algorithm requires a CUDA GPU device. If you only have access to a CPU device, start with the Olive Auto Optimizer, which will quantize using round-to-nearest.
Quantizing a model using the AWQ algorithm requires a CUDA GPU device. If you only have access to a CPU device, and do not have an Azure subscription you can execute the automatic optimizer with a CPU and use local disk as the cache:

<pre><code>olive auto-opt \
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
--trust_remote_code \
--output_path optimized-model \
--device cpu \
--provider CPUExecutionProvider \
--precision int4 \
--log_level 1
</code></pre>

0 comments on commit e6f45d2

Please sign in to comment.