Tests for Multi Gpu Clusters

This repo contains the means to build a docker image that does some sanity checks using nvidia-smi and 4 real-world performance tests.

Sanity Tests

create logs of :

-   nvidia-smi

-   nvidia-smi topo -m

-   nvidia-smi -a

Performance Tests

4 different settings each with 9 combinations of batch and input sizes

-   linear: runs vectors through a small MLP

-   lstm: runs sequences through a multi-layer BiLSTM

-   bert: runs bAbI sequences through a pretrained BERT model with Classification head

-   bert_fp16: same as above, uses fp16 mode

Usage

For Kubernetes use edit gputestv100.yaml for your gpu needs and pvc. Logs are saved under /pvc/results/gpu_perftests/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Tests for Multi Gpu Clusters

Sanity Tests

Performance Tests

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

Tests for Multi Gpu Clusters

Sanity Tests

Performance Tests

Usage