HabanaAI · wangkl2 · Jul 22, 2024 · Jul 22, 2024 · Jul 22, 2024 · Jul 22, 2024
diff --git a/PyTorch/examples/multi_tenants/README.md b/PyTorch/examples/multi_tenants/README.md
@@ -15,6 +15,10 @@ For further information on training deep learning models using Gaudi, refer to [
 Please follow the instructions provided in the [Gaudi Installation Guide](https://docs.habana.ai/en/latest/Installation_Guide/GAUDI_Installation_Guide.html) to set up the
 environment including the `$PYTHON` environment variable. The guide will walk you through the process of setting up your system to run the model on Gaudi.
 
+### Create Docker Container and Set up Python
+
+Please follow the instructions provided in [Run Using Containers on Habana Base AMI](https://docs.habana.ai/en/latest/AWS_User_Guides/Habana_Deep_Learning_AMI.html#run-using-containers-on-habana-base-ami) to pull the docker image and launch the container. And make sure to setup Python inside the docker container following [Model References Requirements](https://docs.habana.ai/en/latest/AWS_User_Guides/Habana_Deep_Learning_AMI.html#model-references-requirements).
+
 ### Clone Intel Gaudi Model-References
 
 In the docker container, clone this repository and switch to the branch that matches your Intel Gaudi software version. You can run the [`hl-smi`](https://docs.habana.ai/en/latest/Management_and_Monitoring/System_Management_Tools_Guide/System_Management_Tools.html#hl-smi-utility-options) utility to determine the Intel Gaudi software version.
@@ -62,10 +66,31 @@ You can run multiple jobs in parallel using the script described in the followin
 
 ### multi_tenants_resnet_pt.sh
 
-Running `multi_tenants_resnet_pt.sh` script without setting any arguments invokes 2 ResNet50 jobs in parallel, each using 4 Gaudis. User can also provide two sets of module IDs as the script arguments, i.e., `multi_tenants_resnet_pt.sh "0,1" "2,3"`, invokes 2 jobs in parallel, each using 2 Gaudis. Using the command `hl-smi -Q index,module_id -f csv` will produce a .csv file which will show the corresponding to the AIP number mapped to module_id. This can be used to find which module IDs are available for parallel training. The `HABANA_VISIBLE_MODULES` environment variable and model python script arguments need to be explicitly specified as different values for both jobs.
+#### Run 2 ResNet50 Jobs on Total 8 HPUs with torch.compile Enabled
+
+Running the script without setting any arguments invokes 2 ResNet50 jobs in parallel, each using 4 Gaudis. 
+
+```bash
+bash multi_tenants_resnet_pt.sh
+```
+
+#### Run 2 ResNet50 Jobs on Total 4 HPUs with torch.compile Enabled
+
+User can also provide two sets of module IDs as the script arguments. The following command invokes 2 jobs in parallel, each using 2 Gaudis. 
+
+```bash
+bash multi_tenants_resnet_pt.sh "0,1" "2,3"
+```
 
 #### `HABANA_VISIBLE_MODULES`
 
+Using the command `hl-smi -Q index,module_id -f csv` will produce a .csv file which will show the corresponding to the AIP number mapped to module_id. This can be used to find which module IDs are available for parallel training. The `HABANA_VISIBLE_MODULES` environment variable and model python script arguments need to be explicitly specified as different values for both jobs.
+
 `HABANA_VISIBLE_MODULES` is an environment variable for the list of module IDs, composed by a sequence of single digit integers. The same integer should not be used by multiple jobs running in parallel: 
 For jobs with 4 Gaudis, it is recommended to set this to "0,1,2,3" or "4,5,6,7".
 For jobs with 2 Gaudis, it is recommended to set this to "0,1", "2,3", "4,5", or "6,7".
+
+## Changelog
+### 1.16.2
+ - Added torch.compile support to improve training performance.
+ - Lazy mode support is deprecated for this example.
diff --git a/PyTorch/examples/multi_tenants/multi_tenants_resnet_pt.sh b/PyTorch/examples/multi_tenants/multi_tenants_resnet_pt.sh
@@ -1,4 +1,5 @@
 #!/bin/bash
+export PT_HPU_LAZY_MODE=0
 export MASTER_ADDR=localhost
 
 SCRIPT_DIR=`dirname $(readlink -e ${BASH_SOURCE[0]})`
@@ -53,7 +54,7 @@ function run() {
         --dl-time-exclude=False \
         --custom-lr-values ${LR_VALUES} \
         --custom-lr-milestones  ${LR_MILESTONES} \
-        --seed=123 1> $STDOUT_LOG 2> $STDERR_LOG &
+        --seed=123 --run-lazy-mode=False --use_torch_compile 1> $STDOUT_LOG 2> $STDERR_LOG &
 
     echo "Job ${JOB_ID} starts with ${NUM} cards, stdout: ${STDOUT_LOG}, stderr: ${STDERR_LOG}"
     JOB_ID=$((JOB_ID+1))