changes made per discussion

microsoft · Oct 30, 2024 · e6f45d2 · e6f45d2
1 parent e608d77
commit e6f45d2
Showing 1 changed file with 43 additions and 29 deletions.
diff --git a/src/routes/blogs/olive-shared-cache/+page.svx b/src/routes/blogs/olive-shared-cache/+page.svx
@@ -15,8 +15,8 @@ authorsLink:
     'https://www.linkedin.com/in/devangpatel/',
     'https://www.linkedin.com/in/samuel-kemp-a9253724/'
    ]
-image: 'https://iili.io/d95Pwcx.png'
-imageSquare: 'https://iili.io/d95Pwcx.png'
+image: 'https://artwork.lfaidata.foundation/projects/onnx/stacked/color/onnx-stacked-color.png'
+imageSquare: 'https://artwork.lfaidata.foundation/projects/onnx/stacked/color/onnx-stacked-color.png'
 url: 'https://onnxruntime.ai/blogs/olive-shared-cache'
 ---
 
@@ -29,19 +29,26 @@ Efficiency in machine learning not only relies on the effectiveness of algorithm
 
 This blog post delves into how Olive’s shared cache feature can help you save time and costs, illustrated with practical examples.
 
+### Prerequisites
+
+- An Azure Storage Account. For details on how to create an Azure Storage Account, read [Create an Azure Storage Account](https://learn.microsoft.com/azure/storage/common/storage-account-create?tabs=azure-portal).
+- Once you have created your Azure Storage Account, you'll need to create a storage container (a container organizes a set of blobs, similar to a directory in a file system). For more details on how to create a storage container, read [Create a container](https://learn.microsoft.com/azure/storage/blobs/blob-containers-portal#create-a-container).
 
 ## 🤝 Team collaboration during optimization process
 
 User A begins the optimization process by employing Olive’s quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model using the AWQ algorithm. This step is marked by the following command line execution:
 
-```bash
-olive quantize \
-  --model Microsoft/Phi-3-mini-4k-instruct \
-  --algorithm awq \
-  --account-name {AZURE_STORAGE_ACCOUNT} \
-  --container-name {STORAGE_CONTAINER_NAME} \
-  --log_level 1  
-```
+<pre><code>olive quantize \
+    --model_name_or_path Microsoft/Phi-3-mini-4k-instruct \
+    --algorithm awq \
+    --account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
+    --container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
+    --log_level 1  
+</code></pre>
+
+> **Note:**
+> - The `--account_name` should be set to your Azure Storage Account name.
+> - The `--container_name` should be set to the container name in the Azure Storage Account.
 
 The optimization process generates a log that confirms the cache has been saved in a shared location in Azure:
 
@@ -58,14 +65,13 @@ This shared cache is a pivotal element, as it stores the optimized model, making
 
 User B, another active team member in the optimization project, reaps the benefits of User A’s efforts. By using the same quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) with the AWQ algorithm, User B’s process is significantly expedited. The command is identical, and User B leverages the same Azure Storage account and container:
 
-```bash
-olive quantize \
-  --model Microsoft/Phi-3-mini-4k-instruct \
-  --algorithm awq \
-  --account-name {AZURE_STORAGE_ACCOUNT} \
-  --container-name {STORAGE_CONTAINER_NAME} \
-  --log_level 1
-```
+<pre><code>olive quantize \
+    --model_name_or_path Microsoft/Phi-3-mini-4k-instruct \
+    --algorithm awq \
+    --account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
+    --container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
+    --log_level 1
+</code></pre>
 
 A critical part of this step is the following log output highlights the retrieval of the quantized model from the shared cache rather than re-computing the AWQ quantization. 
 
@@ -90,33 +96,31 @@ Optimization is not limited to quantization alone. Olive’s Automatic optimizer
 
 User A leverages Automatic optimizer to optimize the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) for CPU. The command line instruction for this task is:
 
-```bash
-olive auto-opt \
+<pre><code>olive auto-opt \
     --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
     --trust_remote_code \
     --output_path optimized-model \
     --device cpu \
     --provider CPUExecutionProvider \
     --precision int4 \
-    --account-name {AZURE_STORAGE_ACCOUNT} \
-    --container-name {STORAGE_CONTAINER_NAME} \    
+    --account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
+    --container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \ 
     --log_level 1
-```
+</code></pre>
 
 For each task executed in the automatic optimizer - for example, model download, ONNX Conversion, ONNX graph optimization, Quantization, etc - the intermediate model will be stored in the shared cache for reuse on different hardware targets. For example, if later User B wants to optimize the same model for a different target (say, the GPU of a Windows device) they would execute the following command:
 
-```bash
-olive auto-opt \
+<pre><code>olive auto-opt \
     --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
     --trust_remote_code \
     --output_path optimized-model \
     --device gpu \
     --provider DmlExecutionProvider \
     --precision int4 \
-    --account-name {AZURE_STORAGE_ACCOUNT} \
-    --container-name {STORAGE_CONTAINER_NAME} \    
+    --account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
+    --container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \   
     --log_level 1
-```
+</code></pre>
 
 The common intermediate steps User A's CPU optimization - such as ONNX conversion and ONNX graph optimization - will be reused, which will save User B time and cost. 
 
@@ -145,4 +149,14 @@ To try the quantization and Auto Optimizer commands with the shared-cache featur
 pip install olive-ai[auto-opt,shared-cache] autoawq
 ```
 
-> **Note:** Quantizing a model using the AWQ algorithm requires a CUDA GPU device. If you only have access to a CPU device, start with the Olive Auto Optimizer, which will quantize using round-to-nearest.
+Quantizing a model using the AWQ algorithm requires a CUDA GPU device. If you only have access to a CPU device, and do not have an Azure subscription you can execute the automatic optimizer with a CPU and use local disk as the cache:
+
+<pre><code>olive auto-opt \
+    --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
+    --trust_remote_code \
+    --output_path optimized-model \
+    --device cpu \
+    --provider CPUExecutionProvider \
+    --precision int4 \
+    --log_level 1
+</code></pre>