Skip to content

Commit

Permalink
Merge pull request #315 from flexaihq/main
Browse files Browse the repository at this point in the history
various updates for CM tools for Windows and better debugging
  • Loading branch information
gfursin authored Oct 1, 2024
2 parents db60dad + 1051735 commit 2593657
Show file tree
Hide file tree
Showing 38 changed files with 493 additions and 376 deletions.
5 changes: 5 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
### 20240927
* added "test dummy" script to test Docker containers
* added more standard Nvidia Docker configuration for PyTorch
* added better support to select Docker configurations via UID

### 20240916
* fixed "cm add script"

Expand Down
2 changes: 1 addition & 1 deletion COPYRIGHT.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Copyright (c) 2021-2024 MLCommons

The cTuning foundation and OctoML donated this project to MLCommons to benefit everyone.
Grigori Fursin, the cTuning foundation and OctoML donated this project to MLCommons to benefit everyone.

Copyright (c) 2014-2021 cTuning foundation
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,11 @@ cm run script \

[Apache 2.0](LICENSE.md)

## CM concepts

* https://doi.org/10.5281/zenodo.8105339
* https://arxiv.org/abs/2406.16791

## Authors

[Grigori Fursin](https://cKnowledge.org/gfursin) and [Arjun Suresh](https://www.linkedin.com/in/arjunsuresh)
Expand Down
52 changes: 48 additions & 4 deletions automation/script/module.py
Original file line number Diff line number Diff line change
Expand Up @@ -4028,14 +4028,58 @@ def docker(self, i):
(out) (str): if 'con', output to console
parsed_artifact (list): prepared in CM CLI or CM access function
[ (artifact alias, artifact UID) ] or
[ (artifact alias, artifact UID), (artifact repo alias, artifact repo UID) ]
(repos) (str): list of repositories to search for automations
(output_dir) (str): output directory (./ by default)
(docker) (dict): convert keys into docker_{key} strings for CM >= 2.3.8.1
(docker_skip_build) (bool): do not generate Dockerfiles and do not recreate Docker image (must exist)
(docker_noregenerate) (bool): do not generate Dockerfiles
(docker_norecreate) (bool): do not recreate Docker image
(docker_cfg) (str): if True, show all available basic docker configurations, otherwise pre-select one
(docker_cfg_uid) (str): if True, select docker configuration with this UID
(docker_path) (str): where to create or find Dockerfile
(docker_gh_token) (str): GitHub token for private repositories
(docker_save_script) (str): if !='' name of script to save docker command
(docker_interactive) (bool): if True, run in interactive mode
(docker_it) (bool): the same as `docker_interactive`
(docker_detached) (bool): detach Docker
(docker_dt) (bool) the same as `docker_detached`
(docker_base_image) (str): force base image
(docker_os) (str): force docker OS (default: ubuntu)
(docker_os_version) (str): force docker OS version (default: 22.04)
(docker_image_tag_extra) (str): add extra tag (default:-latest)
(docker_cm_repo) (str): force CM automation repository when building Docker (default: cm4mlops)
(docker_cm_repos)
(docker_cm_repo_flags)
(dockerfile_env)
(docker_skip_cm_sys_upgrade) (bool): if True, do not install CM sys deps
(docker_extra_sys_deps)
(fake_run_deps)
(docker_run_final_cmds)
(all_gpus)
(num_gpus)
(docker_device)
(docker_port_maps)
(docker_shm_size)
(docker_extra_run_args)
Returns:
(CM return dict):
Expand Down
57 changes: 30 additions & 27 deletions automation/script/module_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1335,15 +1335,9 @@ def dockerfile(i):
Args:
(CM input dict):
(out) (str): if 'con', output to console
parsed_artifact (list): prepared in CM CLI or CM access function
[ (artifact alias, artifact UID) ] or
[ (artifact alias, artifact UID), (artifact repo alias, artifact repo UID) ]
(repos) (str): list of repositories to search for automations
(output_dir) (str): output directory (./ by default)
(out) (str): if 'con', output to console
(repos) (str): list of repositories to search for automations
(output_dir) (str): output directory (./ by default)
Returns:
(CM return dict):
Expand Down Expand Up @@ -1632,15 +1626,6 @@ def docker(i):
(out) (str): if 'con', output to console
(docker_skip_build) (bool): do not generate Dockerfiles and do not recreate Docker image (must exist)
(docker_noregenerate) (bool): do not generate Dockerfiles
(docker_norecreate) (bool): do not recreate Docker image
(docker_path) (str): where to create or find Dockerfile
(docker_gh_token) (str): GitHub token for private repositories
(docker_save_script) (str): if !='' name of script to save docker command
(docker_interactive) (bool): if True, run in interactive mode
(docker_cfg) (str): if True, show all available basic docker configurations, otherwise pre-select one
Returns:
(CM return dict):
Expand All @@ -1653,6 +1638,20 @@ def docker(i):
import copy
import re

from cmind import __version__ as current_cm_version

self_module = i['self_module']

if type(i.get('docker', None)) == dict:
# Grigori started cleaning and refactoring this code on 20240929
#
# 1. use --docker dictionary instead of --docker_{keys}

if utils.compare_versions(current_cm_version, '2.3.8.1') >= 0:
docker_params = utils.convert_dictionary(i['docker'], 'docker')
i.update(docker_params)
del(i['docker'])

quiet = i.get('quiet', False)

detached = i.get('docker_detached', '')
Expand All @@ -1670,13 +1669,12 @@ def docker(i):

# Check simplified CMD: cm docker script "python app image-classification onnx"
# If artifact has spaces, treat them as tags!
self_module = i['self_module']
self_module.cmind.access({'action':'detect_tags_in_artifact', 'automation':'utils', 'input':i})

# CAREFUL -> artifacts and parsed_artifacts are not supported in input (and should not be?)
if 'artifacts' in i: del(i['artifacts'])
if 'parsed_artifacts' in i: del(i['parsed_artifacts'])

# Prepare "clean" input to replicate command
r = self_module.cmind.access({'action':'prune_input', 'automation':'utils', 'input':i, 'extra_keys_starts_with':['docker_']})
i_run_cmd_arc = r['new_input']
Expand All @@ -1693,13 +1691,19 @@ def docker(i):

# Check available configurations
docker_cfg = i.get('docker_cfg', '')
if docker_cfg != '':
docker_cfg_uid = i.get('docker_cfg_uid', '')

if docker_cfg != '' or docker_cfg_uid != '':
# Check if docker_cfg is turned on but not selected
if type(docker_cfg) == bool or str(docker_cfg).lower() in ['true','yes']:
docker_cfg= ''

r = self_module.cmind.access({'action':'select_cfg', 'automation':'utils,dc2743f8450541e3',
'tags':'basic,docker,configurations', 'title':'docker', 'alias':docker_cfg})

r = self_module.cmind.access({'action':'select_cfg',
'automation':'utils,dc2743f8450541e3',
'tags':'basic,docker,configurations',
'title':'docker',
'alias':docker_cfg,
'uid':docker_cfg_uid})
if r['return'] > 0:
if r['return'] == 16:
return {'return':1, 'error':'Docker configuration {} was not found'.format(docker_cfg)}
Expand All @@ -1708,10 +1712,9 @@ def docker(i):
selection = r['selection']

docker_input_update = selection['meta']['input']

i.update(docker_input_update)


########################################################################################
# Run dockerfile
if not noregenerate_docker_file:
Expand All @@ -1722,7 +1725,7 @@ def docker(i):
cur_dir = os.getcwd()

console = i.get('out') == 'con'

# Search for script(s)
r = aux_search({'self_module': self_module, 'input': i})
if r['return']>0: return r
Expand Down
62 changes: 36 additions & 26 deletions automation/utils/module_cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -230,16 +230,18 @@ def select_cfg(i):
self_module = i['self_module']
tags = i['tags']
alias = i.get('alias', '')
uid = i.get('uid', '')
title = i.get('title', '')

# Check if alias is not provided
r = self_module.cmind.access({'action':'find', 'automation':'cfg', 'tags':'basic,docker,configurations'})
if r['return'] > 0: return r

lst = r['list']

selector = []

# Do coarse-grain search for CM artifacts
for l in lst:
p = l.path

Expand All @@ -257,45 +259,53 @@ def select_cfg(i):
if not f.startswith('_cm') and (f.endswith('.json') or f.endswith('.yaml')):
selector.append({'path':os.path.join(p, f), 'alias':f[:-5]})

if len(selector) == 0:
return {'return':16, 'error':'configuration was not found'}

select = 0
if len(selector) > 1:
xtitle = ' ' + title if title!='' else ''
print ('')
print ('Available{} configurations:'.format(xtitle))

print ('')
# Load meta for name and UID
selector_with_meta = []
for s in range(0, len(selector)):
ss = selector[s]

for s in range(0, len(selector)):
ss = selector[s]
path = ss['path']

path = ss['path']
full_path_without_ext = path[:-5]

full_path_without_ext = path[:-5]
r = cmind.utils.load_yaml_and_json(full_path_without_ext)
if r['return']>0:
print ('Warning: problem loading configuration file {}'.format(path))

r = cmind.utils.load_yaml_and_json(full_path_without_ext)
if r['return']>0:
print ('Warning: problem loading configuration file {}'.format(path))
meta = r['meta']

meta = r['meta']
if uid == '' or meta.get('uid', '') == uid:
ss['meta'] = meta
selector_with_meta.append(ss)

# Quit if no configurations found
if len(selector_with_meta) == 0:
return {'return':16, 'error':'configuration was not found'}

selector = sorted(selector, key = lambda x: x['meta'].get('name',''))
select = 0
if len(selector_with_meta) > 1:
xtitle = ' ' + title if title!='' else ''
print ('')
print ('Available{} configurations:'.format(xtitle))

print ('')

selector_with_meta = sorted(selector_with_meta, key = lambda x: x['meta'].get('name',''))
s = 0
for ss in selector:
for ss in selector_with_meta:
alias = ss['alias']
name = ss['meta'].get('name','')
uid = ss['meta'].get('uid', '')
name = ss['meta'].get('name', '')

x = name
if x!='': x+=' '
x += '('+alias+')'
print ('{}) {}'.format(s, x))
x += '(' + uid + ')'

print (f'{s}) {x}'.format(s, x))

s+=1

print ('')
select = input ('Enter configuration number of press Enter for 0: ')

Expand All @@ -306,6 +316,6 @@ def select_cfg(i):
if select<0 or select>=len(selector):
return {'return':1, 'error':'selection is out of range'}

ss = selector[select]
ss = selector_with_meta[select]

return {'return':0, 'selection':ss}
39 changes: 39 additions & 0 deletions cfg/benchmark-run-mlperf-inference-v4.1/_cm.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
alias: benchmark-run-mlperf-inference-v4.1
uid: b7e89771987d4168

automation_alias: cfg
automation_uid: 88dce9c160324c5d

tags:
- benchmark
- run
- mlperf
- inference
- v4.1

name: "MLPerf inference - v4.1"

supported_compute:
- ee8c568e0ac44f2b
- fe379ecd1e054a00
- d8f06040f7294319

bench_uid: 39877bb63fb54725

view_dimensions:
- - input.device
- "MLPerf device"
- - input.implementation
- "MLPerf implementation"
- - input.backend
- "MLPerf backend"
- - input.model
- "MLPerf model"
- - input.scenario
- "MLPerf scenario"
- - input.host_os
- "Host OS"
- - output.state.cm-mlperf-inference-results-last.performance
- "Got performance"
- - output.state.cm-mlperf-inference-results-last.accuracy
- "Got accuracy"
9 changes: 9 additions & 0 deletions cfg/docker-basic-configurations/basic-ubuntu-24.04.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
uid: 12e86eb386314866

name: "Basic Ubuntu 24.04"

input:
docker_base_image: 'ubuntu:24.04'
docker_os: ubuntu
docker_os_version: '24.04'

Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
uid: 854e65fb31584d63

name: "Nvidia Ubuntu 20.04 CUDA 11.8 cuDNN 8.6.0 PyTorch 1.13.0"
name: "Nvidia Ubuntu 20.04 CUDA 11.8 cuDNN 8.6.0 PyTorch 1.13.0 (pytorch:22.10)"

ref_url: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-22-10.html

input:
docker_base_image: 'nvcr.io/nvidia/pytorch:22.10-py3'
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
uid: e0e7167139a74e36

name: "Nvidia Ubuntu 22.04 CUDA 12.1 cuDNN 8.9.1 PyTorch 2.0.0"
name: "Nvidia Ubuntu 22.04 CUDA 12.1 cuDNN 8.9.1 PyTorch 2.0.0 (pytorch:23.05)"

ref_url: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-23-05.html

input:
docker_base_image: 'nvcr.io/nvidia/pytorch:23.05-py3'
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
uid: 49fc51f2999b4545

name: "Nvidia Ubuntu 22.04 CUDA 12.4 cuDNN 9.0.0 PyTorch 2.3.0"
name: "Nvidia Ubuntu 22.04 CUDA 12.4 cuDNN 9.0.0 PyTorch 2.3.0 (pytorch:24.03)"

ref_url: https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel-24-03.html

input:
docker_base_image: 'nvcr.io/nvidia/pytorch:24.03-py3'
Expand Down
Loading

0 comments on commit 2593657

Please sign in to comment.