Skip to content

Commit

Permalink
HPTQ Changes, Unload Support for Multi Output Layers
Browse files Browse the repository at this point in the history
  • Loading branch information
oguzhanbsolak committed Aug 29, 2024
1 parent 14790f8 commit d46edef
Show file tree
Hide file tree
Showing 14 changed files with 232 additions and 73 deletions.
21 changes: 15 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# ADI MAX78000/MAX78002 Model Training and Synthesis

July 22, 2024
August 27, 2024

**Note: This branch requires PyTorch 2. Please see the archive-1.8 branch for PyTorch 1.8 support. [KNOWN_ISSUES](KNOWN_ISSUES.txt) contains a list of known issues.**

Expand Down Expand Up @@ -1620,13 +1620,15 @@ When using the `-8` command line switch, all module outputs are quantized to 8-b
The last layer can optionally use 32-bit output for increased precision. This is simulated by adding the parameter `wide=True` to the module function call.
##### Weights: Quantization-Aware Training (QAT)
##### Weights and Activations: Quantization-Aware Training (QAT)
Quantization-aware training (QAT) is enabled by default. QAT is controlled by a policy file, specified by `--qat-policy`.
* After `start_epoch` epochs, training will learn an additional parameter that corresponds to a shift of the final sum of products.
* After `start_epoch` epochs, an intermediate epoch with no backpropagation will be realized to collect activation statistics. Each layer's activation ranges will be determined based on the range & resolution trade-off from the collected activations. Then, QAT will start and an additional parameter (`output_shift`) will be learned to shift activations for compensating weights & biases scaling down.
* `weight_bits` describes the number of bits available for weights.
* `overrides` allows specifying the `weight_bits` on a per-layer basis.
* `outlier_removal_z_score` defines the z-score threshold for outlier removal during activation range calculation. (default: 8.0)
* `shift_quantile` defines the quantile of the parameters distribution to be used for the `output_shift` parameter. (default: 1.0)

By default, weights are quantized to 8-bits after 30 epochs as specified in `policies/qat_policy.yaml`. A more refined example that specifies weight sizes for individual layers can be seen in `policies/qat_policy_cifar100.yaml`.

Expand Down Expand Up @@ -1745,7 +1747,7 @@ For both approaches, the `quantize.py` software quantizes an existing PyTorch ch
#### Quantization-Aware Training (QAT)
Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights: Quantization-Aware Training (QAT)](#weights-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`.
Quantization-aware training is the better performing approach. It is enabled by default. QAT learns additional parameters during training that help with quantization (see [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat). No additional arguments (other than input, output, and device) are needed for `quantize.py`.
The input checkpoint to `quantize.py` is either `qat_best.pth.tar`, the best QAT epoch’s checkpoint, or `qat_checkpoint.pth.tar`, the final QAT epoch’s checkpoint.
Expand Down Expand Up @@ -2004,7 +2006,7 @@ The behavior of a training session might change when Quantization Aware Training
While there can be multiple reasons for this, check two important settings that can influence the training behavior:
* The initial learning rate may be set too high. Reduce LR by a factor of 10 or 100 by specifying a smaller initial `--lr` on the command line, and possibly by reducing the epoch `milestones` for further reduction of the learning rate in the scheduler file specified by `--compress`. Note that the the selected optimizer and the batch size both affect the learning rate.
* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights: Quantization-Aware Training (QAT)](#weights:-auantization-aware-training \(qat\)).*
* The epoch when QAT is engaged may be set too low. Increase `start_epoch` in the QAT scheduler file specified by `--qat-policy`, and increase the total number of training epochs by increasing the value specified by the `--epochs` command line argument and by editing the `ending_epoch` in the scheduler file specified by `--compress`. *See also the rule of thumb discussed in the section [Weights and Activations: Quantization-Aware Training (QAT)](#weights-and-activations-quantization-aware-training-qat).*
Expand Down Expand Up @@ -2209,6 +2211,7 @@ The following table describes the most important command line arguments for `ai8
| `--no-unload` | Do not create the `cnn_unload()` function | |
| `--no-kat` | Do not generate the `check_output()` function (disable known-answer test) | |
| `--no-deduplicate-weights` | Do not deduplicate weights and and bias values | |
| `--scale-output` | Use scales from the checkpoint to recover output range while generating `cnn_unload()` function | |
### YAML Network Description
Expand Down Expand Up @@ -2330,6 +2333,12 @@ The following keywords are required for each `unload` list item:
`width`: Data width (optional, defaults to 8) — either 8 or 32
`write_gap`: Gap between data words (optional, defaults to 0)
When `--scale-output` is specified, scales from the checkpoint file are used to recover the output range. If there is a non-zero scale for the 8 bits output, the output will be scaled and kept in 16 bits. If the scale is zero, the output will be 8 bits. For 32 bits output, the output will be kept in 32 bits always.
Example:
![Unload Array](docs/unload_example.png)
##### `layers` (Mandatory)
`layers` is a list that defines the per-layer description, as shown below:
Expand Down Expand Up @@ -2654,7 +2663,7 @@ Example:
By default, the final layer is used as the output layer. Output layers are checked using the known-answer test, and they are copied from hardware memory when `cnn_unload()` is called. The tool also checks that output layer data isn’t overwritten by any later layers.
When specifying `output: true`, any layer (or a combination of layers) can be used as an output layer.
*Note:* When `unload:` is used, output layers are not used for generating `cnn_unload()`.
*Note:* When `--no-unload` is used, output layers are not used for generating `cnn_unload()`.
Example:
`output: true`
Expand Down
Binary file added docs/unload_example.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion gen-demos-max78000.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ python ai8xize.py --test-dir $TARGET --prefix cifar-100-simplewide2x-mixed --che
python ai8xize.py --test-dir $TARGET --prefix cifar-100-residual --checkpoint-file trained/ai85-cifar100-residual-qat8-q.pth.tar --config-file networks/cifar100-ressimplenet.yaml --softmax $COMMON_ARGS --boost 2.5 "$@"
python ai8xize.py --test-dir $TARGET --prefix kws20_v3 --checkpoint-file trained/ai85-kws20_v3-qat8-q.pth.tar --config-file networks/kws20-v3-hwc.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix kws20_nas --checkpoint-file trained/ai85-kws20_nas-qat8-q.pth.tar --config-file networks/kws20-nas-hwc.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix faceid --checkpoint-file trained/ai85-faceid-qat8-q.pth.tar --config-file networks/faceid.yaml --fifo $COMMON_ARGS "$@"
python izer/add_fake_passthrough.py --input-checkpoint-path trained/ai85-faceid_112-qat-q.pth.tar --output-checkpoint-path trained/ai85-fakepass-faceid_112-qat-q.pth.tar --layer-name fakepass --layer-depth 128 --layer-name-after-pt linear --low-memory-footprint "$@"
python ai8xize.py --test-dir $TARGET --prefix faceid_112 --checkpoint-file trained/ai85-fakepass-faceid_112-qat-q.pth.tar --config-file networks/ai85-faceid_112.yaml --fifo $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix cats-dogs --checkpoint-file trained/ai85-catsdogs-qat8-q.pth.tar --config-file networks/cats-dogs-hwc.yaml --fifo --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix camvid_unet --checkpoint-file trained/ai85-camvid-unet-large-fakept-q.pth.tar --config-file networks/camvid-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 --new-kernel-loader "$@"
python ai8xize.py --test-dir $TARGET --prefix aisegment_unet --checkpoint-file trained/ai85-aisegment-unet-large-fakept-q.pth.tar --config-file networks/aisegment-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 --new-kernel-loader "$@"
Expand Down
4 changes: 2 additions & 2 deletions gen-demos-max78002.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python ai8xize.py --test-dir $TARGET --prefix cifar-100-simplewide2x-mixed --che
python ai8xize.py --test-dir $TARGET --prefix cifar-100-residual --checkpoint-file trained/ai85-cifar100-residual-qat8-q.pth.tar --config-file networks/cifar100-ressimplenet.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix kws20_v3_1 --checkpoint-file trained/ai87-kws20_v3-qat8-q.pth.tar --config-file networks/ai87-kws20-v3-hwc.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix kws20_v2_1 --checkpoint-file trained/ai87-kws20_v2-qat8-q.pth.tar --config-file networks/ai87-kws20-v2-hwc.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix faceid --checkpoint-file trained/ai85-faceid-qat8-q.pth.tar --config-file networks/faceid.yaml --fifo $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix mobilefacenet-112 --checkpoint-file trained/ai87-mobilefacenet-112-qat-q.pth.tar --config-file networks/ai87-mobilefacenet-112.yaml --fifo $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix cats-dogs --checkpoint-file trained/ai85-catsdogs-qat8-q.pth.tar --config-file networks/cats-dogs-hwc-no-fifo.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix camvid_unet --checkpoint-file trained/ai85-camvid-unet-large-fakept-q.pth.tar --config-file networks/camvid-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 "$@"
python ai8xize.py --test-dir $TARGET --prefix aisegment_unet --checkpoint-file trained/ai85-aisegment-unet-large-fakept-q.pth.tar --config-file networks/aisegment-unet-large-fakept.yaml $COMMON_ARGS --overlap-data --mlator --no-unload --max-checklines 8192 "$@"
Expand All @@ -21,5 +21,5 @@ python ai8xize.py --test-dir $TARGET --prefix cifar-100-effnet2 --checkpoint-fil
python ai8xize.py --test-dir $TARGET --prefix cifar-100-mobilenet-v2-0.75 --checkpoint-file trained/ai87-cifar100-mobilenet-v2-0.75-qat8-q.pth.tar --config-file networks/ai87-cifar100-mobilenet-v2-0.75.yaml --softmax $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix imagenet --checkpoint-file trained/ai87-imagenet-effnet2-q.pth.tar --config-file networks/ai87-imagenet-effnet2.yaml $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix facedet_tinierssd --checkpoint-file trained/ai87-facedet-tinierssd-qat8-q.pth.tar --config-file networks/ai87-facedet-tinierssd.yaml --sample-input tests/sample_vggface2_facedetection.npy $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix pascalvoc_fpndetector --checkpoint-file trained/ai87-pascalvoc-fpndetector-qat8-q.pth.tar --config-file networks/ai87-pascalvoc-fpndetector.yaml --fifo --sample-input tests/sample_pascalvoc_256_320.npy --overwrite --no-unload $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix pascalvoc_fpndetector --checkpoint-file trained/ai87-pascalvoc-fpndetector-qat8-q.pth.tar --config-file networks/ai87-pascalvoc-fpndetector.yaml --fifo --sample-input tests/sample_pascalvoc_256_320.npy --no-unload $COMMON_ARGS "$@"
python ai8xize.py --test-dir $TARGET --prefix kinetics --checkpoint-file trained/ai85-kinetics-qat8-q.pth.tar --config-file networks/ai85-kinetics-actiontcn.yaml --overlap-data --softmax --zero-sram $COMMON_ARGS "$@"
23 changes: 19 additions & 4 deletions izer/backend/max7800x.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
###################################################################################################
# Copyright (C) 2019-2023 Maxim Integrated Products, Inc. All Rights Reserved.
# Copyright (C) 2019-2024 Maxim Integrated Products, Inc. All Rights Reserved.
#
# Maxim Integrated Products, Inc. Default Copyright Notice:
# https://www.maximintegrated.com/en/aboutus/legal/copyrights.html
Expand Down Expand Up @@ -69,6 +69,7 @@ def create_net(self) -> str: # pylint: disable=too-many-locals,too-many-branche
fast_fifo_quad = state.fast_fifo_quad
fifo = state.fifo
final_layer = state.final_layer
final_scale = state.final_scale
first_layer_used = state.first_layer_used
flatten = state.flatten
forever = state.forever
Expand Down Expand Up @@ -136,6 +137,7 @@ def create_net(self) -> str: # pylint: disable=too-many-locals,too-many-branche
riscv = state.riscv
riscv_cache = state.riscv_cache
riscv_flash = state.riscv_flash
scale_output = state.scale_output
simple1b = state.simple1b
simulated_sequence = state.simulated_sequence
snoop = state.snoop
Expand Down Expand Up @@ -1152,7 +1154,8 @@ def create_net(self) -> str: # pylint: disable=too-many-locals,too-many-branche
conv_str = ', no convolution, '
apb.output(conv_str +
f'{output_chan[ll]}x{output_dim_str[ll]} output\n', embedded_code)

apb.output('\n', embedded_code)
apb.output(f'// Final Scales: {final_scale}\n', embedded_code)
apb.output('\n', embedded_code)

apb.header()
Expand Down Expand Up @@ -3553,8 +3556,20 @@ def run_eltwise(
elif block_mode:
assets.copy('assets', 'blocklevel-ai' + str(device), base_directory, test_name)
elif embedded_code:
output_count = output_chan[terminating_layer] \
* output_dim[terminating_layer][0] * output_dim[terminating_layer][1]
output_count = 0
for i in range(terminating_layer + 1):
if output_layer[i]:
if output_width[i] != 32:
if scale_output:
output_count += (output_chan[i] * output_dim[i][0] * output_dim[i][1]
+ (32 // (2 * output_width[i]) - 1)) \
// (32 // (2 * output_width[i]))
else:
output_count += (output_chan[i] * output_dim[i][0] * output_dim[i][1]
+ (32 // output_width[i] - 1)) \
// (32 // output_width[i])
else:
output_count += output_chan[i] * output_dim[i][0] * output_dim[i][1]
insert = summary_stats + \
'\n/* Number of outputs for this network */\n' \
f'#define CNN_NUM_OUTPUTS {output_count}'
Expand Down
11 changes: 9 additions & 2 deletions izer/checkpoint.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
###################################################################################################
# Copyright (C) 2019-2023 Maxim Integrated Products, Inc. All Rights Reserved.
# Copyright (C) 2019-2024 Maxim Integrated Products, Inc. All Rights Reserved.
#
# Maxim Integrated Products, Inc. Default Copyright Notice:
# https://www.maximintegrated.com/en/aboutus/legal/copyrights.html
Expand Down Expand Up @@ -56,6 +56,7 @@ def load(
bias_min = []
bias_max = []
bias_size = []
final_scale = {}

checkpoint = torch.load(checkpoint_file, map_location='cpu')
print(f'Reading {checkpoint_file} to configure network weights...')
Expand Down Expand Up @@ -251,6 +252,12 @@ def load(
# Add implicit shift based on quantization
output_shift[seq] += 8 - abs(quantization[seq])

final_scale_name = '.'.join([layer, 'final_scale'])
if final_scale_name in checkpoint_state:
w = checkpoint_state[final_scale_name].numpy().astype(np.int64)
final_scale[seq] = w.item()
else:
final_scale[seq] = 0
layers += 1
seq += 1

Expand Down Expand Up @@ -286,4 +293,4 @@ def load(
sys.exit(1)

return layers, weights, bias, output_shift, \
input_channels, output_channels
input_channels, output_channels, final_scale
3 changes: 3 additions & 0 deletions izer/commandline.py
Original file line number Diff line number Diff line change
Expand Up @@ -464,6 +464,8 @@ def get_parser() -> argparse.Namespace:
help='GitHub repository name for update checking')
group.add_argument('--yamllint', metavar='S', default='yamllint',
help='name of linter for YAML files (default: yamllint)')
group.add_argument('--scale-output', action='store_true', default=False,
help="scale output with final layer scale factor (default: false)")

args = parser.parse_args()

Expand Down Expand Up @@ -691,6 +693,7 @@ def set_state(args: argparse.Namespace) -> None:
state.rtl_preload_weights = args.rtl_preload_weights
state.runtest_filename = args.runtest_filename
state.sample_filename = args.sample_filename
state.scale_output = args.scale_output
state.simple1b = args.simple1b
state.sleep = args.deepsleep
state.slow_load = args.slow_load
Expand Down
8 changes: 6 additions & 2 deletions izer/izer.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
###################################################################################################
# Copyright (C) 2019-2023 Maxim Integrated Products, Inc. All Rights Reserved.
# Copyright (C) 2019-2024 Maxim Integrated Products, Inc. All Rights Reserved.
#
# Maxim Integrated Products, Inc. Default Copyright Notice:
# https://www.maximintegrated.com/en/aboutus/legal/copyrights.html
Expand Down Expand Up @@ -74,6 +74,7 @@ def main():

# If not using test data, load weights and biases
# This also configures the network's output channels
final_scale = None
if cfg['arch'] != 'test':
if not args.checkpoint_file:
eprint('--checkpoint-file is a required argument.')
Expand All @@ -96,7 +97,7 @@ def main():
else:
# PyTorch checkpoint file selected
layers, weights, bias, output_shift, \
input_channels, output_channels = \
input_channels, output_channels, final_scale = \
checkpoint.load(
args.checkpoint_file,
cfg['arch'],
Expand Down Expand Up @@ -134,6 +135,8 @@ def main():
params['bypass'],
filename=args.bias_input,
)
if final_scale is None:
final_scale = {ll: 0 for ll in range(cfg_layers)}
if cfg_layers > layers:
# Add empty weights/biases and channel counts for layers not in checkpoint file.
# The checkpoint file does not contain weights for non-convolution operations.
Expand Down Expand Up @@ -630,6 +633,7 @@ def main():
state.eltwise = eltwise
state.final_layer = final_layer
state.first_layer_used = min_layer
state.final_scale = final_scale
state.flatten = flatten
state.in_offset = input_offset
state.in_sequences = in_sequences
Expand Down
7 changes: 6 additions & 1 deletion izer/quantize.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
###################################################################################################
# Copyright (C) 2019-2023 Maxim Integrated Products, Inc. All Rights Reserved.
# Copyright (C) 2019-2024 Maxim Integrated Products, Inc. All Rights Reserved.
#
# Maxim Integrated Products, Inc. Default Copyright Notice:
# https://www.maximintegrated.com/en/aboutus/legal/copyrights.html
Expand Down Expand Up @@ -241,6 +241,11 @@ def get_max_bit_shift(t, clamp_bits, shift_quantile, return_bit_shift=False):
out_shift_name = '.'.join([layer, 'output_shift'])
out_shift = torch.Tensor([-1 * get_max_bit_shift(params_r, clamp_bits,
shift_quantile, True)])
threshold_name = '.'.join([layer, 'threshold'])
if threshold_name in checkpoint_state:
threshold = checkpoint_state[threshold_name]
out_shift = (out_shift - threshold).clamp(min=-7.-clamp_bits,
max=23.-clamp_bits)
new_checkpoint_state[out_shift_name] = out_shift
if new_masks_dict is not None:
new_masks_dict[out_shift_name] = out_shift
Expand Down
Loading

0 comments on commit d46edef

Please sign in to comment.