From 754f2b0a021b1011ca45691d46ef2e6b00726755 Mon Sep 17 00:00:00 2001 From: Sonali Saha Date: Thu, 22 Jul 2021 04:01:58 -0400 Subject: [PATCH 1/2] [GSC] Add dockerfile and manifest file for tensorflow ResNet50 and BERT models gsc image Signed-off-by: Sonali Saha --- Tools/gsc/test/tensorflow/README.md | 190 ++++++++++++++++++ .../ubuntu18.04-tensorflow-bert.dockerfile | 25 +++ ...ubuntu18.04-tensorflow-resnet50.dockerfile | 19 ++ .../gsc/test/ubuntu18.04-tensorflow.manifest | 6 + 4 files changed, 240 insertions(+) create mode 100644 Tools/gsc/test/tensorflow/README.md create mode 100755 Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile create mode 100644 Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile create mode 100755 Tools/gsc/test/ubuntu18.04-tensorflow.manifest diff --git a/Tools/gsc/test/tensorflow/README.md b/Tools/gsc/test/tensorflow/README.md new file mode 100644 index 0000000000..03f480c123 --- /dev/null +++ b/Tools/gsc/test/tensorflow/README.md @@ -0,0 +1,190 @@ +# Inference on TensorFlow BERT and ResNet50 models: +The ``../test`` directory contains dockerfile and manifest file to run inference with TensorFlow BERT and +ResNet50 sample workloads on GSC. Specifically, both these examples use pre-trained models to run +inference. We tested this on Ubuntu 18.04 and uses the package version with Python 3.6. + +## Bidirectional Encoder Representations from Transformers (BERT): +BERT is a method of pre-training language representations and then use that trained model for +downstream NLP tasks like 'question answering'. BERT is an unsupervised, deeply birectional system +for pre-training NLP. In this BERT sample, we use 'BERT-Large, Uncased (Whole Word Masking)' model +and perform int8 inference. More details about BERT can be found at +https://github.com/google-research/bert. + +## Residual Network (ResNet): +ResNet50 is a convolutional neural network that is 50 layers deep. In this ResNet50(v1.5) sample, +we use a pre-trained model and perform int8 inference. More details about ResNet50 can be found at +https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/resnet50v1_5. + +## Pre-System setting: +Linux systems have CPU frequency scaling governor that helps the system to scale the CPU frequency +to achieve best performance or to save power based on the requirement. To achieve the best +peformance, please set the CPU frequency scaling governor to performance mode. + +``` +for ((i=0; i<$(nproc); i++)); \ +do echo 'performance' > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done +``` + +## Common build steps: +1. ``cd $(GRAPHENE_DIR)/Tools/gsc`` + +2. Create a configuration file: ``cp config.yaml.template config.yaml`` +Manually adopt config.yaml to the installed Intel SGX driver and desired Graphene repository/version + +3. Generate the signing key: ``openssl genrsa -3 -out enclave-key.pem 3072`` + +## Build graphenize Docker image and run BERT inference: +1. Build docker image: +``` +cd test +docker build --rm -t ubuntu18.04-tensorflow-bert -f ubuntu18.04-tensorflow-bert.dockerfile \ +../../../Examples +``` + +2. Graphenize the docker image using gsc build: +``` +cd .. +./gsc build --insecure-args ubuntu18.04-tensorflow-bert test/ubuntu18.04-tensorflow.manifest +``` + +3. Sign the graphenized Docker image using gsc sign-image: +``` +./gsc sign-image ubuntu18.04-tensorflow-bert enclave-key.pem +``` + +4. To run int8 inference on GSC: +``` +docker run --device=/dev/sgx_enclave --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ +--env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ +gsc-ubuntu18.04-tensorflow-bert \ +models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \ +--init_checkpoint=data/bert_large_checkpoints/model.ckpt-3649 \ +--vocab_file=data/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ +--bert_config_file=data/wwm_uncased_L-24_H-1024_A-16/bert_config.json \ +--predict_file=data/wwm_uncased_L-24_H-1024_A-16/dev-v1.1.json \ +--precision=int8 \ +--predict_batch_size=32 \ +--experimental_gelu=True \ +--optimized_softmax=True \ +--input_graph=data/asymmetric_per_channel_bert_int8.pb \ +--do_predict=True \ +--mode=benchmark \ +--inter_op_parallelism_threads=1 \ +--intra_op_parallelism_threads=36 \ +--output_dir=output/bert-squad-output +``` + +5. To run int8 inference on native container: +``` +docker run --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ +--env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ +ubuntu18.04-tensorflow-bert \ +models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \ +--init_checkpoint=data/bert_large_checkpoints/model.ckpt-3649 \ +--vocab_file=data/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ +--bert_config_file=data/wwm_uncased_L-24_H-1024_A-16/bert_config.json \ +--predict_file=data/wwm_uncased_L-24_H-1024_A-16/dev-v1.1.json \ +--precision=int8 \ +--predict_batch_size=32 \ +--experimental_gelu=True \ +--optimized_softmax=True \ +--input_graph=data/asymmetric_per_channel_bert_int8.pb \ +--do_predict=True \ +--mode=benchmark \ +--inter_op_parallelism_threads=1 \ +--intra_op_parallelism_threads=36 \ +--output_dir=output/bert-squad-output +``` + +6. Above commands are for a 36 core system. Please set the following options accordingly for +optimal performance. + - OMP_NUM_THREADS='Core(s) per socket' + - --cpuset-cpus to 'Core(s) per socket' + - num-intra-threads='Core(s) per socket' + - If hyperthreading is enabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0`` + - If hyperthreading is disabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact`` + - **NOTE** To get 'Core(s) per socket', do ``lscpu | grep 'Core(s) per socket'`` \ + OMP_NUM_THREADS sets the maximum number of threads to use for OpenMP parallel regions. \ + KMP_AFFINITY binds OpenMP threads to physical processing units. + +## Build graphenize Docker image and run ResNet50 inference: +1. Build docker image: +``` +cd test +docker build --rm -t ubuntu18.04-tensorflow-resnet50 -f ubuntu18.04-tensorflow-resnet50.dockerfile \ +../../../Examples +``` + +2. Graphenize the docker image using gsc build: +```cd .. +./gsc build --insecure-args ubuntu18.04-tensorflow-resnet50 test/ubuntu18.04-tensorflow.manifest +``` + +3. Sign the graphenized Docker image using gsc sign-image: +``` +./gsc sign-image ubuntu18.04-tensorflow-resnet50 enclave-key.pem +``` + +4. To run inference on GSC: +``` +docker run --device=/dev/sgx_enclave --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ +--env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ +gsc-ubuntu18.04-tensorflow-resnet50 \ +models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \ +--input-graph=resnet50v1_5_int8_pretrained_model.pb \ +--num-inter-threads=1 \ +--num-intra-threads=36 \ +--batch-size=32 \ +--warmup-steps=50 \ +--steps=500 +``` +**NOTE**: When OOM happens user can set environment varibale ``TF_MKL_ALLOC_MAX_BYTES`` to upper +bound on memory allocation. As an example in a machine with 32 GB memory pass option +``--env TF_MKL_ALLOC_MAX_BYTES=17179869184`` to docker run command when OOM happens. + +5. To run inference on native Container: +``` +docker run --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ +--env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ +ubuntu18.04-tensorflow-resnet50 \ +models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \ +--input-graph=resnet50v1_5_int8_pretrained_model.pb \ +--num-inter-threads=1 \ +--num-intra-threads=36 \ +--batch-size=32 \ +--warmup-steps=50 \ +--steps=500 +``` + +6. Above commands are for a 36 core system. Please set the following options accordingly for +optimal performance. + - OMP_NUM_THREADS='Core(s) per socket' + - --cpuset-cpus to 'Core(s) per socket' + - num-intra-threads='Core(s) per socket' + - If hyperthreading is enabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0`` + - If hyperthreading is disabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact`` + - The options batch-size, warmup-steps and steps can be varied. + - **NOTE** To get 'Core(s) per socket', do ``lscpu | grep 'Core(s) per socket'`` \ + OMP_NUM_THREADS sets the maximum number of threads to use for OpenMP parallel regions. \ + KMP_AFFINITY binds OpenMP threads to physical processing units. + +## Performance considerations: +- Preheat manifest option pre-faults the enclave memory and moves the performance penalty to +graphene-sgx invocation (before the workload starts executing). To use preheat option, add +``sgx.preheat_enclave = 1`` to the manifest template. +- TCMalloc and mimalloc are memory allocator libraries from Google and Microsoft that can help +improve performance significantly based on the workloads. At any point, only one of these +allocators can be used. + - TCMalloc (Please update the binary location and name if different from default) + - Install tcmalloc: ``sudo apt-get install google-perftools`` + - Add these in the manifest template + - ``loader.env.LD_PRELOAD = "/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"`` + - ``sgx.trusted_files.libtcmalloc = "file:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"`` + - ``sgx.trusted_files.libunwind = "file:/usr/lib/x86_64-linux-gnu/libunwind.so.8"`` + - Save the template and rebuild. + - mimalloc (Please update the binary location and name if different from default) + - Install mimalloc using the steps from https://github.com/microsoft/mimalloc + - Add these in the manifest template + - ``loader.env.LD_PRELOAD = "/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"`` + - ``sgx.trusted_files.libmimalloc = "file:/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"`` + - Save the template and rebuild. \ No newline at end of file diff --git a/Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile b/Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile new file mode 100755 index 0000000000..8c7a00768e --- /dev/null +++ b/Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile @@ -0,0 +1,25 @@ +From ubuntu:18.04 + +# Install prerequisites +RUN apt-get update \ + && apt-get install -y git wget \ + && apt-get install -y python3.6 python3-pip unzip \ + && pip3 install --upgrade pip + +# Install tensorflow +RUN pip3 install intel-tensorflow-avx512==2.4.0 + +# Download models +RUN git clone https://github.com/IntelAI/models.git /models/ + +# Download data +RUN mkdir -p data \ + && cd data \ + && wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip \ + && unzip wwm_uncased_L-24_H-1024_A-16.zip \ + && wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -P wwm_uncased_L-24_H-1024_A-16 \ + && wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/bert_large_checkpoints.zip \ + && unzip bert_large_checkpoints.zip \ + && wget https://storage.googleapis.com/intel-optimized-tensorflow/models/r2.5-icx-b631821f/asymmetric_per_channel_bert_int8.pb + +ENTRYPOINT ["python3.6"] diff --git a/Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile b/Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile new file mode 100644 index 0000000000..da0c7077bf --- /dev/null +++ b/Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile @@ -0,0 +1,19 @@ +From ubuntu:18.04 + +# Install prerequisites +RUN apt-get update \ + && apt-get install -y git wget \ + && apt-get install -y python3.6 python3-pip + +RUN pip3 install --upgrade pip + +# Install tensorflow +RUN pip3 install intel-tensorflow-avx512==2.4.0 + +# Download input graph file +RUN wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/resnet50v1_5_int8_pretrained_model.pb + +# Download model +RUN git clone https://github.com/IntelAI/models.git /models/ + +ENTRYPOINT ["python3.6"] diff --git a/Tools/gsc/test/ubuntu18.04-tensorflow.manifest b/Tools/gsc/test/ubuntu18.04-tensorflow.manifest new file mode 100755 index 0000000000..ee93a2000d --- /dev/null +++ b/Tools/gsc/test/ubuntu18.04-tensorflow.manifest @@ -0,0 +1,6 @@ +sgx.enclave_size = "32G" +sgx.thread_num = 300 +loader.pal_internal_mem_size = "64M" +loader.insecure__use_host_env = 1 +sgx.allowed_files.tmp = "file:/tmp" +sgx.preheat_enclave = 1 From 54e4d4d9f9142f93d731a32ac0faaeb8d0f62251 Mon Sep 17 00:00:00 2001 From: Sonali Saha Date: Fri, 13 Aug 2021 06:49:15 -0400 Subject: [PATCH 2/2] fixup! [GSC] Add dockerfile and manifest file for tensorflow ResNet50 and BERT models gsc image --- Tools/gsc/Examples/tensorflow/README.md | 92 +++++++++ .../ubuntu18.04-tensorflow-bert.dockerfile | 9 +- ...ubuntu18.04-tensorflow-resnet50.dockerfile | 8 +- .../ubuntu18.04-tensorflow.manifest | 0 Tools/gsc/test/tensorflow/README.md | 190 ------------------ 5 files changed, 99 insertions(+), 200 deletions(-) create mode 100644 Tools/gsc/Examples/tensorflow/README.md rename Tools/gsc/{test => Examples/tensorflow}/ubuntu18.04-tensorflow-bert.dockerfile (78%) mode change 100755 => 100644 rename Tools/gsc/{test => Examples/tensorflow}/ubuntu18.04-tensorflow-resnet50.dockerfile (71%) rename Tools/gsc/{test => Examples/tensorflow}/ubuntu18.04-tensorflow.manifest (100%) mode change 100755 => 100644 delete mode 100644 Tools/gsc/test/tensorflow/README.md diff --git a/Tools/gsc/Examples/tensorflow/README.md b/Tools/gsc/Examples/tensorflow/README.md new file mode 100644 index 0000000000..48ac97b24d --- /dev/null +++ b/Tools/gsc/Examples/tensorflow/README.md @@ -0,0 +1,92 @@ +# Inference on TensorFlow BERT and ResNet50 models: +For additional information on how to install, run and optimize TensorFlow, please see +https://github.com/Satya1493/graphene/blob/tensorflow/Examples/tensorflow/README.md. + +## Build graphenize Docker image and run BERT inference: +1. Build docker image: +``` +docker build --rm -t ubuntu18.04-tensorflow-bert -f ubuntu18.04-tensorflow-bert.dockerfile . +``` + +2. Graphenize the docker image using gsc build: +``` +cd ../.. +./gsc build --insecure-args ubuntu18.04-tensorflow-bert Examples/tensorflow/ubuntu18.04-tensorflow.manifest +``` + +3. Sign the graphenized Docker image using gsc sign-image: +``` +./gsc sign-image ubuntu18.04-tensorflow-bert enclave-key.pem +``` + +4. To run fp32 inference on GSC: +``` +docker run --device=/dev/sgx_enclave --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ +--env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ +gsc-ubuntu18.04-tensorflow-bert \ +models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \ +--init_checkpoint=data/bert_large_checkpoints/model.ckpt-3649 \ +--vocab_file=data/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ +--bert_config_file=data/wwm_uncased_L-24_H-1024_A-16/bert_config.json \ +--predict_file=data/wwm_uncased_L-24_H-1024_A-16/dev-v1.1.json \ +--precision=fp32 \ +--predict_batch_size=32 \ +--experimental_gelu=True \ +--optimized_softmax=True \ +--input_graph=data/fp32_bert_squad.pb \ +--do_predict=True \ +--mode=benchmark \ +--inter_op_parallelism_threads=1 \ +--intra_op_parallelism_threads=36 \ +--output_dir=output/bert-squad-output +``` + +5. To run fp32 inference on native container (outside Graphene), remove +``--device=/dev/sgx_enclave`` and replace ``gsc-ubuntu18.04-tensorflow-bert`` with +``ubuntu18.04-tensorflow-bert`` in the above command. + +6. Above commands are for a 36 core system. Please check +https://github.com/Satya1493/graphene/blob/tensorflow/Examples/tensorflow/README.md for setting +different options for optimal performance. + +## Build graphenize Docker image and run ResNet50 inference: +1. Build docker image: +``` +docker build --rm -t ubuntu18.04-tensorflow-resnet50 -f ubuntu18.04-tensorflow-resnet50.dockerfile . +``` + +2. Graphenize the docker image using gsc build: +``` +cd ../.. +./gsc build --insecure-args ubuntu18.04-tensorflow-resnet50 Example/tensorflow/ubuntu18.04-tensorflow.manifest +``` + +3. Sign the graphenized Docker image using gsc sign-image: +``` +./gsc sign-image ubuntu18.04-tensorflow-resnet50 enclave-key.pem +``` + +4. To run int8 inference on GSC: +``` +docker run --device=/dev/sgx_enclave --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ +--env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ +gsc-ubuntu18.04-tensorflow-resnet50 \ +models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \ +--input-graph=resnet50v1_5_int8_pretrained_model.pb \ +--num-inter-threads=1 \ +--num-intra-threads=36 \ +--batch-size=32 \ +--warmup-steps=50 \ +--steps=500 +``` +**NOTE**: When OOM happens user can set environment varibale ``TF_MKL_ALLOC_MAX_BYTES`` to upper +bound on memory allocation. As an example in a machine with 32 GB memory pass option +``--env TF_MKL_ALLOC_MAX_BYTES=17179869184`` to docker run command when OOM happens. + +5. To run int8 inference on native container (outside Graphene), remove +``--device=/dev/sgx_enclave`` and replace ``gsc-ubuntu18.04-tensorflow-resnet50`` with +``ubuntu18.04-tensorflow-resnet50`` in the above command. + +6. Above commands are for a 36 core system. Please check +https://github.com/Satya1493/graphene/blob/tensorflow/Examples/tensorflow/README.md for setting +different options for optimal performance. diff --git a/Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile b/Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow-bert.dockerfile old mode 100755 new mode 100644 similarity index 78% rename from Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile rename to Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow-bert.dockerfile index 8c7a00768e..d4c5e097c5 --- a/Tools/gsc/test/ubuntu18.04-tensorflow-bert.dockerfile +++ b/Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow-bert.dockerfile @@ -2,8 +2,7 @@ From ubuntu:18.04 # Install prerequisites RUN apt-get update \ - && apt-get install -y git wget \ - && apt-get install -y python3.6 python3-pip unzip \ + && apt-get install -y git wget python3 python3-pip unzip \ && pip3 install --upgrade pip # Install tensorflow @@ -13,13 +12,13 @@ RUN pip3 install intel-tensorflow-avx512==2.4.0 RUN git clone https://github.com/IntelAI/models.git /models/ # Download data -RUN mkdir -p data \ +RUN mkdir -p data \ && cd data \ && wget https://storage.googleapis.com/bert_models/2019_05_30/wwm_uncased_L-24_H-1024_A-16.zip \ && unzip wwm_uncased_L-24_H-1024_A-16.zip \ && wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json -P wwm_uncased_L-24_H-1024_A-16 \ && wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/bert_large_checkpoints.zip \ && unzip bert_large_checkpoints.zip \ - && wget https://storage.googleapis.com/intel-optimized-tensorflow/models/r2.5-icx-b631821f/asymmetric_per_channel_bert_int8.pb + && wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v2_4_0/fp32_bert_squad.pb -ENTRYPOINT ["python3.6"] +ENTRYPOINT ["python3"] diff --git a/Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile b/Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow-resnet50.dockerfile similarity index 71% rename from Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile rename to Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow-resnet50.dockerfile index da0c7077bf..0345987a29 100644 --- a/Tools/gsc/test/ubuntu18.04-tensorflow-resnet50.dockerfile +++ b/Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow-resnet50.dockerfile @@ -2,10 +2,8 @@ From ubuntu:18.04 # Install prerequisites RUN apt-get update \ - && apt-get install -y git wget \ - && apt-get install -y python3.6 python3-pip - -RUN pip3 install --upgrade pip + && apt-get install -y git wget python3 python3-pip \ + && pip3 install --upgrade pip # Install tensorflow RUN pip3 install intel-tensorflow-avx512==2.4.0 @@ -16,4 +14,4 @@ RUN wget https://storage.googleapis.com/intel-optimized-tensorflow/models/v1_8/r # Download model RUN git clone https://github.com/IntelAI/models.git /models/ -ENTRYPOINT ["python3.6"] +ENTRYPOINT ["python3"] diff --git a/Tools/gsc/test/ubuntu18.04-tensorflow.manifest b/Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow.manifest old mode 100755 new mode 100644 similarity index 100% rename from Tools/gsc/test/ubuntu18.04-tensorflow.manifest rename to Tools/gsc/Examples/tensorflow/ubuntu18.04-tensorflow.manifest diff --git a/Tools/gsc/test/tensorflow/README.md b/Tools/gsc/test/tensorflow/README.md deleted file mode 100644 index 03f480c123..0000000000 --- a/Tools/gsc/test/tensorflow/README.md +++ /dev/null @@ -1,190 +0,0 @@ -# Inference on TensorFlow BERT and ResNet50 models: -The ``../test`` directory contains dockerfile and manifest file to run inference with TensorFlow BERT and -ResNet50 sample workloads on GSC. Specifically, both these examples use pre-trained models to run -inference. We tested this on Ubuntu 18.04 and uses the package version with Python 3.6. - -## Bidirectional Encoder Representations from Transformers (BERT): -BERT is a method of pre-training language representations and then use that trained model for -downstream NLP tasks like 'question answering'. BERT is an unsupervised, deeply birectional system -for pre-training NLP. In this BERT sample, we use 'BERT-Large, Uncased (Whole Word Masking)' model -and perform int8 inference. More details about BERT can be found at -https://github.com/google-research/bert. - -## Residual Network (ResNet): -ResNet50 is a convolutional neural network that is 50 layers deep. In this ResNet50(v1.5) sample, -we use a pre-trained model and perform int8 inference. More details about ResNet50 can be found at -https://github.com/IntelAI/models/tree/icx-launch-public/benchmarks/image_recognition/tensorflow/resnet50v1_5. - -## Pre-System setting: -Linux systems have CPU frequency scaling governor that helps the system to scale the CPU frequency -to achieve best performance or to save power based on the requirement. To achieve the best -peformance, please set the CPU frequency scaling governor to performance mode. - -``` -for ((i=0; i<$(nproc); i++)); \ -do echo 'performance' > /sys/devices/system/cpu/cpu$i/cpufreq/scaling_governor; done -``` - -## Common build steps: -1. ``cd $(GRAPHENE_DIR)/Tools/gsc`` - -2. Create a configuration file: ``cp config.yaml.template config.yaml`` -Manually adopt config.yaml to the installed Intel SGX driver and desired Graphene repository/version - -3. Generate the signing key: ``openssl genrsa -3 -out enclave-key.pem 3072`` - -## Build graphenize Docker image and run BERT inference: -1. Build docker image: -``` -cd test -docker build --rm -t ubuntu18.04-tensorflow-bert -f ubuntu18.04-tensorflow-bert.dockerfile \ -../../../Examples -``` - -2. Graphenize the docker image using gsc build: -``` -cd .. -./gsc build --insecure-args ubuntu18.04-tensorflow-bert test/ubuntu18.04-tensorflow.manifest -``` - -3. Sign the graphenized Docker image using gsc sign-image: -``` -./gsc sign-image ubuntu18.04-tensorflow-bert enclave-key.pem -``` - -4. To run int8 inference on GSC: -``` -docker run --device=/dev/sgx_enclave --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ ---env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ -gsc-ubuntu18.04-tensorflow-bert \ -models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \ ---init_checkpoint=data/bert_large_checkpoints/model.ckpt-3649 \ ---vocab_file=data/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ ---bert_config_file=data/wwm_uncased_L-24_H-1024_A-16/bert_config.json \ ---predict_file=data/wwm_uncased_L-24_H-1024_A-16/dev-v1.1.json \ ---precision=int8 \ ---predict_batch_size=32 \ ---experimental_gelu=True \ ---optimized_softmax=True \ ---input_graph=data/asymmetric_per_channel_bert_int8.pb \ ---do_predict=True \ ---mode=benchmark \ ---inter_op_parallelism_threads=1 \ ---intra_op_parallelism_threads=36 \ ---output_dir=output/bert-squad-output -``` - -5. To run int8 inference on native container: -``` -docker run --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ ---env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ -ubuntu18.04-tensorflow-bert \ -models/models/language_modeling/tensorflow/bert_large/inference/run_squad.py \ ---init_checkpoint=data/bert_large_checkpoints/model.ckpt-3649 \ ---vocab_file=data/wwm_uncased_L-24_H-1024_A-16/vocab.txt \ ---bert_config_file=data/wwm_uncased_L-24_H-1024_A-16/bert_config.json \ ---predict_file=data/wwm_uncased_L-24_H-1024_A-16/dev-v1.1.json \ ---precision=int8 \ ---predict_batch_size=32 \ ---experimental_gelu=True \ ---optimized_softmax=True \ ---input_graph=data/asymmetric_per_channel_bert_int8.pb \ ---do_predict=True \ ---mode=benchmark \ ---inter_op_parallelism_threads=1 \ ---intra_op_parallelism_threads=36 \ ---output_dir=output/bert-squad-output -``` - -6. Above commands are for a 36 core system. Please set the following options accordingly for -optimal performance. - - OMP_NUM_THREADS='Core(s) per socket' - - --cpuset-cpus to 'Core(s) per socket' - - num-intra-threads='Core(s) per socket' - - If hyperthreading is enabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0`` - - If hyperthreading is disabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact`` - - **NOTE** To get 'Core(s) per socket', do ``lscpu | grep 'Core(s) per socket'`` \ - OMP_NUM_THREADS sets the maximum number of threads to use for OpenMP parallel regions. \ - KMP_AFFINITY binds OpenMP threads to physical processing units. - -## Build graphenize Docker image and run ResNet50 inference: -1. Build docker image: -``` -cd test -docker build --rm -t ubuntu18.04-tensorflow-resnet50 -f ubuntu18.04-tensorflow-resnet50.dockerfile \ -../../../Examples -``` - -2. Graphenize the docker image using gsc build: -```cd .. -./gsc build --insecure-args ubuntu18.04-tensorflow-resnet50 test/ubuntu18.04-tensorflow.manifest -``` - -3. Sign the graphenized Docker image using gsc sign-image: -``` -./gsc sign-image ubuntu18.04-tensorflow-resnet50 enclave-key.pem -``` - -4. To run inference on GSC: -``` -docker run --device=/dev/sgx_enclave --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ ---env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ -gsc-ubuntu18.04-tensorflow-resnet50 \ -models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \ ---input-graph=resnet50v1_5_int8_pretrained_model.pb \ ---num-inter-threads=1 \ ---num-intra-threads=36 \ ---batch-size=32 \ ---warmup-steps=50 \ ---steps=500 -``` -**NOTE**: When OOM happens user can set environment varibale ``TF_MKL_ALLOC_MAX_BYTES`` to upper -bound on memory allocation. As an example in a machine with 32 GB memory pass option -``--env TF_MKL_ALLOC_MAX_BYTES=17179869184`` to docker run command when OOM happens. - -5. To run inference on native Container: -``` -docker run --cpuset-cpus="0-35" --env OMP_NUM_THREADS=36 \ ---env KMP_AFFINITY=granularity=fine,noverbose,compact,1,0 \ -ubuntu18.04-tensorflow-resnet50 \ -models/models/image_recognition/tensorflow/resnet50v1_5/inference/eval_image_classifier_inference.py \ ---input-graph=resnet50v1_5_int8_pretrained_model.pb \ ---num-inter-threads=1 \ ---num-intra-threads=36 \ ---batch-size=32 \ ---warmup-steps=50 \ ---steps=500 -``` - -6. Above commands are for a 36 core system. Please set the following options accordingly for -optimal performance. - - OMP_NUM_THREADS='Core(s) per socket' - - --cpuset-cpus to 'Core(s) per socket' - - num-intra-threads='Core(s) per socket' - - If hyperthreading is enabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact,1,0`` - - If hyperthreading is disabled: use ``KMP_AFFINITY=granularity=fine,verbose,compact`` - - The options batch-size, warmup-steps and steps can be varied. - - **NOTE** To get 'Core(s) per socket', do ``lscpu | grep 'Core(s) per socket'`` \ - OMP_NUM_THREADS sets the maximum number of threads to use for OpenMP parallel regions. \ - KMP_AFFINITY binds OpenMP threads to physical processing units. - -## Performance considerations: -- Preheat manifest option pre-faults the enclave memory and moves the performance penalty to -graphene-sgx invocation (before the workload starts executing). To use preheat option, add -``sgx.preheat_enclave = 1`` to the manifest template. -- TCMalloc and mimalloc are memory allocator libraries from Google and Microsoft that can help -improve performance significantly based on the workloads. At any point, only one of these -allocators can be used. - - TCMalloc (Please update the binary location and name if different from default) - - Install tcmalloc: ``sudo apt-get install google-perftools`` - - Add these in the manifest template - - ``loader.env.LD_PRELOAD = "/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"`` - - ``sgx.trusted_files.libtcmalloc = "file:/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4"`` - - ``sgx.trusted_files.libunwind = "file:/usr/lib/x86_64-linux-gnu/libunwind.so.8"`` - - Save the template and rebuild. - - mimalloc (Please update the binary location and name if different from default) - - Install mimalloc using the steps from https://github.com/microsoft/mimalloc - - Add these in the manifest template - - ``loader.env.LD_PRELOAD = "/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"`` - - ``sgx.trusted_files.libmimalloc = "file:/usr/local/lib/mimalloc-1.7/libmimalloc.so.1.7"`` - - Save the template and rebuild. \ No newline at end of file