Skip to content

Commit

Permalink
Libraries update for Stable Diffusion (#335) (#336)
Browse files Browse the repository at this point in the history
* Libraries update for Stable Diffusion (#335)

* compliance checker update for StableDiffusion

* RCP checker update for StableDiffusion

* renamed the stable diffusion rules files to match the benchmark name

* added "stable_diffusion" to relevant configs, renamed rcp file

* renamed AT_LEAST_N_TIMES to AT_LEAST (#337)

* Remove stable_diffusion_v2 references

---------

Co-authored-by: Ahmad Kiswani <[email protected]>
  • Loading branch information
pgmpablo157321 and ahmadki authored Sep 29, 2023
1 parent c27d838 commit 7fe011c
Show file tree
Hide file tree
Showing 10 changed files with 239 additions and 26 deletions.
1 change: 1 addition & 0 deletions mlperf_logging/benchmark_meta.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
'minigo': 10,
'resnet': 5,
'ssd': 5,
'stable_diffusion': 10,
'transformer': 10,
'ncf': 10,
'rnnt': 10,
Expand Down
21 changes: 12 additions & 9 deletions mlperf_logging/compliance_checker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_
### Existing config files for training submissions

3.1.0/common.yaml - currently the default config file, checks common fields complience and equeues benchmark-specific config file
3.1.0/closed_common.yaml - the common rules file for closed submissions. These rules apply to all benchmarks
3.1.0/closed_common.yaml - the common rules file for closed submissions. These rules apply to all benchmarks
3.1.0/open_common.yaml - the common rules file for open submissions. These rules apply to all benchmarks
3.1.0/closed_resnet.yaml - Per-benchmark rules, closed submissions.
3.1.0/closed_ssd.yaml
Expand All @@ -33,6 +33,7 @@ As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_
3.1.0/closed_bert.yaml
3.1.0/closed_dlrm_dcnv2.yaml
3.1.0/closed_gpt3.yaml
3.1.0/closed_stable_diffusion.yaml
3.1.0/open_resnet.yaml - Per-benchmark rules, closed submissions.
3.1.0/open_ssd.yaml
3.1.0/open_maskrcnn.yaml
Expand All @@ -41,6 +42,7 @@ As log examples use [NVIDIA's training logs](https://github.com/mlperf/training_
3.1.0/open_bert.yaml
3.1.0/open_dlrm_dcnv2.yaml
3.1.0/open_gpt3.yaml
3.1.0/open_stable_diffusion.yaml

### Existing config files for HPC submissions

Expand All @@ -64,15 +66,15 @@ Compliance checking is done following below algorithm.
2. If present, evaluate `CHECK` section, and raise an exception if the result is false
7. Print all warning messages

Possible side effects of yaml sections execution can be [printing output](#other-operations), or [enqueueing
Possible side effects of yaml sections execution can be [printing output](#other-operations), or [enqueueing
additional yaml files to be verified](#enqueuing-additional-config-files).

### Config file syntax
Rules to be checked are provided in yaml (config) file. A config file contains the following records:

#### `BEGIN` record
Defines `CODE` to be executed before any other rules defined in the current file. This record is optional
and there can be up to a single `BEGIN` record per config file.
and there can be up to a single `BEGIN` record per config file.

Example:

Expand All @@ -87,6 +89,7 @@ The following fields are optional:
- `REQ` - specifies the requirement regarding occurrence. Possible values :
- `EXACTLY_ONE` - current key has to appear exactly once
- `AT_LEAST_ONE` - current key has to appear at least once
- `AT_LEAST(n)` - current key has to appear at least n times
- `AT_LEAST_ONE_OR(alternatives)` - current key or one of the alternative has to appear at least once;
alternatives is a comma separated list of keys
- `PRE` - code to be executed before performing checks
Expand All @@ -112,14 +115,14 @@ The following fields are optional:

#### Global and local state access

During processing of the records there is a global state `s` maintained, accessible from
During processing of the records there is a global state `s` maintained, accessible from
code provided in yaml. In addition, rules can access the information fields (values) `v`
of the record, as well as timestamp and the original line string as part of the record `ll`.

Global state `s` can be used to enforce any cross keys rules, by updating the global state
Global state `s` can be used to enforce any cross keys rules, by updating the global state
in `POST` (or `PRE`) of one `KEY` and using that information for `CHECK` of another `KEY`.
For each config file, `s` starts as an empty dictionary, so in order to track global state
it would require adding an entry to `s`.
For each config file, `s` starts as an empty dictionary, so in order to track global state
it would require adding an entry to `s`.

Example:

Expand Down Expand Up @@ -152,7 +155,7 @@ Config files in the queue are processed independently, meaning that they do not

Each config file may define it's `BEGIN` and `END` records, as well as any other `KEY` rules.

Example:
Example:

- KEY:
NAME: submission_benchmark
Expand All @@ -164,7 +167,7 @@ Example:
#### Other operations

`CODE`, `REQ`, and `POST` fields are executed using python's `exec` function. `CHECK` is performed
using `eval` call. As such, any legal python code would be suitable for use.
using `eval` call. As such, any legal python code would be suitable for use.

For instance, can define rules that would print out information as shown in the [example above](#global-and-local-state-access).

Expand Down
13 changes: 12 additions & 1 deletion mlperf_logging/compliance_checker/mlp_compliance.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def parse_alternatives(self, string):
alternatives = in_pharentises.split(',')
return [s.strip() for s in alternatives]

def parse_at_least(self, string):
n_string = string[len('AT_LEAST(') : -1]
n = int(n_string)
return n

def configured_checks(self, loglines, config_file):
with open(config_file) as f:
checks = yaml.load(f, Loader=yaml.BaseLoader)
Expand All @@ -164,7 +169,7 @@ def configured_checks(self, loglines, config_file):
begin_blocks = [x for x in checks if list(x)[0]=='BEGIN']
assert(len(begin_blocks)<=1) # up to one begin block
if len(begin_blocks)==1:
exec(begin_blocks[0]['BEGIN']['CODE'].strip(), state)
exec(begin_blocks[0]['BEGIN']['CODE'].strip(), state, locals())

key_records = {}
for k in checks:
Expand Down Expand Up @@ -231,6 +236,12 @@ def configured_checks(self, loglines, config_file):
self.put_message(f"Required AT_LEAST_ONE occurrence of '{k}' but found {len(reported_values[k])}",
key=k)

if v['REQ'].startswith('AT_LEAST'):
n = self.parse_at_least(v['REQ'])
if len(reported_values[k])<n:
self.put_message(f"Required AT_LEAST({n}) occurrence of '{k}' but found {len(reported_values[k])}",
key=k)

if v['REQ'].startswith('AT_LEAST_ONE_OR'):
alternatives.add(tuple({k, *self.parse_alternatives(v['REQ'])}))

Expand Down
17 changes: 9 additions & 8 deletions mlperf_logging/compliance_checker/mlp_parser/ruleset_310.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,18 @@
import json
import re
import sys
from dataclasses import dataclass

from io import open

LogLine = collections.namedtuple('LogLine', [
'full_string', # the complete line as a string
'timestamp', # seconds as a float, e.g. 1234.567
'key', # the string key
'value', # the parsed value associated with the tag, or None if no value
'lineno', # the line number in the file
])

@dataclass
class LogLine:
"""Class for keeping track of an item in inventory."""
full_string: str
timestamp: float
key: str
value: str
lineno: int

TOKEN = ':::MLLOG '

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
- KEY:
NAME: submission_benchmark
REQ: EXACTLY_ONE
CHECK: " v['value'] in ['resnet', 'ssd', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
CHECK: " v['value'] in ['resnet', 'ssd', 'stable_diffusion', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
POST: " enqueue_config('training_3.1.0/closed_{}.yaml'.format(v['value'])) "

- KEY:
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Stable diffusion uses two metrics, FID and CLIP.
# These metrics can be calculated offline, using different scripts
# and logged seperatly. Therefore, we create a virtual key
# called aggregated_eval_accuracy, which aggregates
# both metrics into a single log line

- BEGIN:
CODE: |
from dataclasses import replace
agg_eval_lines = {}
for line in loglines:
if line.key == "eval_accuracy":
step_num = line.value['metadata']['step_num']
if step_num not in agg_eval_lines:
new_line = replace(line) # Make a copy
new_line.key = "aggregated_eval_accuracy"
new_line.full_string = "" # Not needed
new_line.lineno = -1 # Not needed
new_line.value = {'value': {'step_num': step_num}, 'metadata':{}}
agg_eval_lines[step_num] = new_line
agg_eval_lines[step_num].timestamp = max(line.timestamp, agg_eval_lines[step_num].timestamp)
agg_eval_lines[step_num].value['value'][line.value['metadata']['metric']] = line.value['value']
loglines.extend(agg_eval_lines.values())
- KEY:
NAME: global_batch_size
REQ: AT_LEAST_ONE
CHECK: " v['value'] >= 0 "

- KEY:
NAME: opt_name
REQ: EXACTLY_ONE
CHECK: " v['value'] == 'adamw' "

- KEY:
NAME: opt_adamw_beta_1
REQ: EXACTLY_ONE
CHECK: " v['value'] == 0.9 "

- KEY:
NAME: opt_adamw_beta_2
REQ: EXACTLY_ONE
CHECK: " v['value'] == 0.999 "

- KEY:
NAME: opt_adamw_epsilon
REQ: EXACTLY_ONE
CHECK: " v['value'] == 1e-08 "

- KEY:
NAME: opt_adamw_weight_decay
REQ: EXACTLY_ONE
CHECK: " v['value'] == 0.01 "

- KEY:
NAME: opt_base_learning_rate
REQ: EXACTLY_ONE
CHECK: " v['value'] >= 0.0 "

- KEY:
NAME: opt_learning_rate_warmup_steps
REQ: EXACTLY_ONE
CHECK: " v['value'] >= 0 "

- KEY:
NAME: aggregated_eval_accuracy
REQ: AT_LEAST(2)
CHECK:
- "'FID' in v['value']"
- "'CLIP' in v['value']"
- "'step_num' in v['value']"
ATLEAST_ONE_CHECK: "(0.0 <= v['value']['FID'] <= 90.0) and (0.15 <= v['value']['CLIP'] <= 1.0)"

Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
- KEY:
NAME: submission_benchmark
REQ: EXACTLY_ONE
CHECK: " v['value'] in ['resnet', 'ssd', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
CHECK: " v['value'] in ['resnet', 'ssd', 'stable_diffusion', 'maskrcnn', 'gpt3', 'dlrm_dcnv2', 'bert', 'rnnt', 'unet3d'] "
POST: " enqueue_config('training_3.1.0/open_{}.yaml'.format(v['value'])) "

Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Stable diffusion uses two metrics, FID and CLIP.
# These metrics can be calculated offline, using different scripts
# and logged seperatly. Therefore, we create a virtual key
# called aggregated_eval_accuracy, which aggregates
# both metrics into a single log line

- BEGIN:
CODE: |
from dataclasses import replace
agg_eval_lines = {}
for line in loglines:
if line.key == "eval_accuracy":
step_num = line.value['metadata']['step_num']
if step_num not in agg_eval_lines:
new_line = replace(line) # Make a copy
new_line.key = "aggregated_eval_accuracy"
new_line.full_string = "" # Not needed
new_line.lineno = -1 # Not needed
new_line.value = {'value': {'step_num': step_num}, 'metadata':{}}
agg_eval_lines[step_num] = new_line
agg_eval_lines[step_num].timestamp = max(line.timestamp, agg_eval_lines[step_num].timestamp)
agg_eval_lines[step_num].value['value'][line.value['metadata']['metric']] = line.value['value']
loglines.extend(agg_eval_lines.values())
- KEY:
NAME: aggregated_eval_accuracy
REQ: AT_LEAST(2)
CHECK:
- "'FID' in v['value']"
- "'CLIP' in v['value']"
- "'step_num' in v['value']"
ATLEAST_ONE_CHECK: "v['value']['FID'] >= 0.0 and v['value']['CLIP'] <= 1.0"

34 changes: 28 additions & 6 deletions mlperf_logging/rcp_checker/rcp_checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
'''

import argparse
from collections import defaultdict
import glob
import json
import logging
Expand All @@ -26,6 +27,7 @@
'ssd' : 5,
'unet3d' : 40,
'rnnt': 10,
'stable_diffusion': 10,
},
"hpc": {
'cosmoflow': 10,
Expand All @@ -48,6 +50,10 @@ def read_submission_file(result_file, use_train_samples):
bs = -1
benchmark = None

# FID and CLIP metrics for stable diffusion are logged asynchronously
# and indepently from each others. We track the eval results
# so we can get the first eval step that passes the convergence criteria
stable_diffusion_eval_results = defaultdict(dict)
with open(result_file, 'r', encoding='latin-1') as f:
# TODO: use mlperf_logging.compliance_checker.mlp_parser instead
file_contents = f.readlines()
Expand All @@ -63,23 +69,39 @@ def read_submission_file(result_file, use_train_samples):
benchmark = json.loads(str)["value"]
if benchmark != "bert" and use_train_samples:
use_train_samples = False
if not use_train_samples and ("eval_error" in str or "eval_accuracy" in str):

if benchmark == "stable_diffusion" and ("eval_error" in str or "eval_accuracy" in str):
eval_accuracy_str = str
eval_step = json.loads(eval_accuracy_str)["metadata"]["step_num"]
eval_metric = json.loads(eval_accuracy_str)["metadata"]["metric"]
eval_score = json.loads(eval_accuracy_str)["value"]
stable_diffusion_eval_results[eval_step][eval_metric] = eval_score
elif not use_train_samples and ("eval_error" in str or "eval_accuracy" in str):
eval_accuracy_str = str
conv_epoch = json.loads(eval_accuracy_str)["metadata"]["epoch_num"]
conv_epoch = round(conv_epoch, 3)
if use_train_samples and "train_samples" in str:
elif use_train_samples and "train_samples" in str:
eval_accuracy_str = str
conv_epoch = json.loads(eval_accuracy_str)["value"]

if "run_stop" in str:
# Epochs to converge is the the last epochs value on
# eval_accuracy line before run_stop
conv_result = json.loads(str)["metadata"]["status"]
if conv_result == "success":
subm_epochs = conv_epoch
not_converged = 0
# Epochs to converge is the the last epochs value on
# eval_accuracy line before run_stop. Except for Stable Diffusion
# where we use the first eval step that passes the convergence criteria
if benchmark == "stable_diffusion":
passing_epochs = []
for eval_step, eval_result in stable_diffusion_eval_results.items():
# TODO: we shouldn't hardcode the convergence criteria here !
if eval_result["FID"] <= 90.0 and eval_result["CLIP"] >= 0.15:
passing_epochs.append(eval_step)
conv_epoch = min(passing_epochs)
subm_epochs = conv_epoch
else:
subm_epochs = 1e9
not_converged = 1
subm_epochs = 1e9

if not_converged:
logging.warning(' Run incomplete or did not converge. Marking as infinite.')
Expand Down
Loading

0 comments on commit 7fe011c

Please sign in to comment.