microsoft · kunal-vaishnavi · Apr 12, 2024 · Apr 12, 2024 · Apr 12, 2024 · Apr 12, 2024
diff --git a/VERSION_INFO b/VERSION_INFO
@@ -1 +1 @@
-0.2.0-dev
+0.2.0-dev
diff --git a/src/python/py/models/README.md b/src/python/py/models/README.md
@@ -3,18 +3,21 @@
 This folder contains the model builder for quickly creating optimized and quantized ONNX models within a few minutes that run with ONNX Runtime GenAI.
 
 # Contents
- - [Current Support](#current-support)
- - [Usage](#usage)
-   - [Full Usage](#full-usage)
-   - [Original PyTorch Model from Hugging Face](#original-pytorch-model-from-hugging-face)
-   - [Original PyTorch Model from Disk](#original-pytorch-model-from-disk)
-   - [Customized or Finetuned PyTorch Model](#customized-or-finetuned-pytorch-model)
-   - [GGUF Model](#gguf-model)
-   - [Extra Options](#extra-options)
-   - [Config Only](#config-only)
-   - [Unit Testing Models](#unit-testing-models)
-     - [Option 1: Use the model builder tool directly](#option-1-use-the-model-builder-tool-directly)
-     - [Option 2: Edit the config.json file](#option-2-edit-the-configjson-file-on-disk-and-then-run-the-model-builder-tool)
+- [Current Support](#current-support)
+- [Usage](#usage)
+  - [Full Usage](#full-usage)
+  - [Original PyTorch Model from Hugging Face](#original-pytorch-model-from-hugging-face)
+  - [Original PyTorch Model from Disk](#original-pytorch-model-from-disk)
+  - [Customized or Finetuned PyTorch Model](#customized-or-finetuned-pytorch-model)
+  - [GGUF Model](#gguf-model)
+  - [Extra Options](#extra-options)
+    - [Config Only](#config-only)
+    - [Exclude Embedding Layer](#exclude-embedding-layer)
+    - [Exclude Language Modeling Head](#exclude-language-modeling-head)
+  - [Unit Testing Models](#unit-testing-models)
+    - [Option 1: Use the model builder directly](#option-1-use-the-model-builder-directly)
+    - [Option 2: Edit the config.json file](#option-2-edit-the-configjson-file-on-disk-and-then-run-the-model-builder)
+- [Design](#design)
 
 ## Current Support
 The tool currently supports the following model architectures.
@@ -89,7 +92,7 @@ python3 builder.py -m model_name -o path_to_output_folder -p precision -e execut
 ```
 To see all available options through `--extra_options`, please use the `help` commands in the `Full Usage` section above.
 
-### Config Only
+#### Config Only
 This scenario is for when you already have your optimized and/or quantized ONNX model and you need to create the config files to run with ONNX Runtime GenAI.
 ```
 # From wheel:
@@ -101,6 +104,28 @@ python3 builder.py -m model_name -o path_to_output_folder -p precision -e execut
 
 Afterwards, please open the `genai_config.json` file in the output folder and modify the fields as needed for your model. You should store your ONNX model in the output folder as well.
 
+#### Exclude Embedding Layer
+This scenario is for when you want to exclude the embedding layer from your ONNX model.
+
+```
+# From wheel:
+python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_embeds=true
+
+# From source:
+python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_embeds=true
+```
+
+#### Exclude Language Modeling Head
+This scenario is for when you want to exclude the language modeling head from your ONNX model.
+
+```
+# From wheel:
+python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_lm_head=true
+
+# From source:
+python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options exclude_lm_head=true
+```
+
 ### Unit Testing Models
 This scenario is where your PyTorch model is already downloaded locally (either in the default Hugging Face cache directory or in a local folder on disk). If it is not already downloaded locally, here is an example of how you can download it.
 
@@ -117,7 +142,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name, cache_dir=cache_dir)
 tokenizer.save_pretrained(cache_dir)
 ```
 
-#### Option 1: Use the model builder tool directly
+#### Option 1: Use the model builder directly
 This option is the simplest but it will download another copy of the PyTorch model onto disk to accommodate the change in the number of hidden layers.
 ```
 # From wheel:
@@ -127,11 +152,11 @@ python3 -m onnxruntime_genai.models.builder -m model_name -o path_to_output_fold
 python3 builder.py -m model_name -o path_to_output_folder -p precision -e execution_provider --extra_options num_hidden_layers=4
 ```
 
-#### Option 2: Edit the config.json file on disk and then run the model builder tool
+#### Option 2: Edit the config.json file on disk and then run the model builder
 
 1. Navigate to where the PyTorch model and its associated files are saved on disk.
 2. Modify `num_hidden_layers` in `config.json` to your desired target (e.g. 4 layers).
-3. Run the below command for the model builder tool.
+3. Run the below command for the model builder.
 
 ```
 # From wheel: