[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

DonghakPark · 2024-10-08T07:48:49Z

[Mixed Precision] Fix gradient clipping logic

update mixed precision - gradient clipping logic

when gradient clipping, gradient should unscale before calc l2norm

Resolves:

[Mixed Precision] Apply Gradient Clipping on Mixed Precision #2746

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: Donghak PARK [email protected]

Signed-off-by: Jiho Chu <[email protected]>

This patch is for inference mode for swap device. It re-enable mmap feature, but writing time is controlled manually, due to the inference mode handling. Signed-off-by: Jiho Chu <[email protected]> asdfsadf Describe a commit content (Until 80 colums per line) in detail ASAP. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped

- According to current paper, accumulating up to 64 ~ 128 w.r.t. K-direction is fine. - Since conventional error metric, and newly introduced metric (max component relative error) is fine as well, introduce experiemntal kernel. - using build option -Dhgemm-precision-level=low can enable such kernel when android build **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>

We will add Var32 Tensor if the Variable Weight is not Full precision (FP32). This eables the Weight Update with full precision and only Apply Gradient Process ueses this Tensor. Therefore, the lifespan of this tensor should be "ApplyGradient". . Modify TensorPool to generate Weigth considering Mixed Precsion. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This pr create the variable fp32 tensor when we create the Weight and Optimizer Weight. . update the manager to create Weight with var32 tensor which requested to weight pool. . update the weight requests with Weight Spec and var, grad and var32 tensors which created already. . add clone Tensor with specific type in tensor.h Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enables the FP16 support for the layers below: . input layer . mse loss layer Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR includes the mixed precision test case. . Input - FC - MSE : "batch_size=2", "model_tensor_type=FP16-FP16", "loss_scale=128" **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This commit modify apply gradient in optimizer. We do not need to save optimizer variables in weight type. Only Optimizer needs the optimizer variables and we should update the weight with full precision to maintain the accuracy. Therefore, remove the var32 tensors for optimizer variables. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add is_NaN function to check if the tensor has NaN value. This is for the check NaN during mixed precision training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add loss scale parameter in runcontext and use it to update mse loss. . Add Loss Scale Parameter in RunLayerContext Constructor . Add applyLossScale func to update return derivitive in Loss Layer . Change MSE Loss Layer to apply the loss scale to return derivitive **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enables the Mixed Precision Training. For now only FP16-FP32 is considered. Additional Test cases will be added. . add getSortedLayerIdx to set the graph order for fowarding. . change clip_weights to lazy_apply_weights to use both cases. . add fowarding_op to run forwarding from that layer which has a gradient with nan. . add while loop for re-run backwarding after reset the loss scale. . add setLossScale in RunLayerContext . add check the gradient if mixed precsion enable. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add inifinity value check in Tensor data. . rename the hasNaN to isValid . add infinity check in isValid Function and now it check NaN and Inf . modify to check the blas_avx and blas_neon . modify graph and model check is_valid rather than has_nan . add unittest of isValid Function **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR chage the loss computation using full precsion rather than half precsion to maintain accuracy. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enables the Mixed Precsion Unittest with Torch Model. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add torch mixed precsion golden data generation and input and output for test. . some fixes to test. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR includes more unittest and fixes for mixed precsion. . Model Unittest . 2 fc layer which generate NaN or Inf Gradient from Troch. . MSE Loss and Check whole procedure of the mixed precsion training. . Even if the FC model only have one weight, but it is good enough to validate the mixed precsion. . Torch model also work similar way of NNTrainer. . Some fixes about the exeuction order of apply gradient when the mixed precision is on. . Update SGD to support Mixed Precision training **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR update the conv2D Layer to support Mixed Precision (FP16). It is based on the PR nnstreamer#2579 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This commit enables mixed precision support for LSTM Layer. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add Execution Mode parameter when we compile. The default is ml::train::ExeuctionMode::TRAIN. Currently we do not support compiler optimization for inference mode such as batch normalization fusing, etc. But we will add more optimization depending on the exeuction mode. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR includes Mixed Precision support for batch normalization layer. When the training, BN layer should run full precsion with FP16 Weight data. Therefore, Reading the FP16 data read and data coversion of the current Weight and Activation are required. For the Inference, we do need compiler optimization like bn fusing. So it also includes execution mode parameters for compile. Because of compilcate data conversion of bn layer, test case generation also needs to update, so that taking the fp16 input,output tensors and weights and converting FP32 weight for computation. For veification, we do need convert FP32 to FP16. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

enable mixed precision on reshape layer - reshape layer only change dim, so change dimensions and check datatype **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

Enable Mixed precision on Pooling 2D Layer - I modified it to properly cast for the case of FP16 so that the mixed precision function can be activated on the existing pooling 2d layer. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

In this PR, when we compute the l2norm of gradient tensor, it converts to full precsion and computes the l2norm for gradient clipping. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add the mu and var backup tensor ( mu_b, var_b ) to restore the previous moving mean and moving variance for mixed precsion training. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enable the check if it need restore previous data. By doing this, we can remove the NaN or Inf data in Tensor for the mixed precsion training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This pr fixes some bugs when it runs as Mixed Precision Training **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

Adding is_mixed variable to check if it is mixed precision training. It means that weight type of model is not full precision. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

Default Blas kernel registration during cl_context initialization Remove RunLayerContext dependency from unit tests Signed-off-by: Debadri Samaddar <[email protected]> (cherry picked from commit 79a7c25)

- This commit is related to issue nnstreamer#2660 - When using multi-inputs, users must feed the data in reverse order due to a known bug that needs fixing. In the current version, the input must be provided in reverse order, which was not shown in the previous example where random data with the same dimensions were used. - To provide a more accurate example to NNTrainer users, I have temporarily updated this example. - Once the issue is handled, further updates will be necessary. Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 2807f69)

- This commit updates the model summary print of the layer with multiple inputs. [ASIS] concat0 concat 1:1:14:2 input0 1:1:4:2 input1 1:1:8:2 input2 [TOBE] concat0 concat 1:1:14:2 input0 input1 input2 Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit f222ecf)

update mixed precision - gradient clipping logic - when gradient clipping, gradient should unscale before calc l2norm **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>

taos-ci · 2024-10-08T07:48:53Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2749. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci · 2024-10-08T08:13:30Z

cibot: @DonghakPark, A builder checker could not be completed because one of the checkers is not completed. In order to find out a reason, please go to http://ci.nnstreamer.ai/nntrainer/ci/repo-workers/pr-checker/2749-202410081648530.3276469707489-2d4e34793f8bee4d6bcada6961a4d3345e7e8d56/.

jihochu and others added 30 commits August 29, 2024 14:05

[SWAP] Add swap mode property

a5d16a4

Signed-off-by: Jiho Chu <[email protected]>

[SWAP] Add inference mode

23e40da

Signed-off-by: Jiho Chu <[email protected]>

[ TEST ] Add Torch Mixed Precision Model Test

d9242f1

This PR enables the Mixed Precsion Unittest with Torch Model. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[ Layer ] enable Mixed Precision in LSTM Layer

f38b831

This commit enables mixed precision support for LSTM Layer. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

[ Mixed ] set initialize gradient in layers and bugfixes

16e3a55

This pr fixes some bugs when it runs as Mixed Precision Training **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

s-debadri and others added 4 commits October 7, 2024 10:54

[gpu/enhance] Utility for registering Blas kernels during initialization

c43937c

Default Blas kernel registration during cl_context initialization Remove RunLayerContext dependency from unit tests Signed-off-by: Debadri Samaddar <[email protected]> (cherry picked from commit 79a7c25)

DonghakPark self-assigned this Oct 8, 2024

DonghakPark requested review from myungjoo, jijoongmoon, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, SeoHyungjun, baek2sm, skykongkong8, djeong20, EunjuYang and a team as code owners October 8, 2024 07:48

DonghakPark changed the title ~~[Mixed Precision] Fix gradient clipping logic~~ [Wait for #2663][Mixed Precision] Fix gradient clipping logic Oct 8, 2024

DonghakPark added the DO NOT MERGE label Oct 8, 2024

DonghakPark mentioned this pull request Oct 30, 2024

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 11/09 15:18 #2663

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

DonghakPark commented Oct 8, 2024

taos-ci commented Oct 8, 2024

taos-ci commented Oct 8, 2024

[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

Are you sure you want to change the base?

[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

Conversation

DonghakPark commented Oct 8, 2024

taos-ci commented Oct 8, 2024

taos-ci commented Oct 8, 2024