Implementation of FakeQuantization in LLTFI #74

ranbir-sharma · 2024-07-17T19:22:13Z

Implementation of FakeQuantization in LLTFI

Implementation milestones for this week (7 August - 14 August 2024)

Figure out the reason behind the outliers in cnn-fmnist model ✅
Iterate over the output tensor to dequantize the inputs ✅
Double check the Fmul Instruction to be correct for collecting W and X inputs of conv and matmul layers ✅
Implement a solution for Basis Vector in both conv and matmul layers ✅

Old Updates are below -

For the week of 31 July - 7 August 2024 :

Implement the working model of getting the scalling factors correct by stripping the outliers from the percentile approach (as directed by the Research Paper) ✅
Add the support for other layers inside the model ✅
Work with Profiling and Fault Injection Stage to divide the work and call different custom build API calls to calibrate the model and use the date for calibration within the ML layers ✅
Look for the solution for Basis vector ✅
Maximum int number support must be 32 bit, thus, need to ensure this is practised well in the Qunatization ✅

For the week of 24-31 July 2024 :

Prepare the Presentation Slides to present the work for 25 July meeting ✅

Implementation milestones for this week

Implementation of the runtime library to remember the FSM regarding the weights and scaling factors ✅
Implement the Falut Injection Stage to import the calibration data from the previous steps ✅
Implement the Quantization Formula as described below within LLTFI ✅
Furthermore, simulate the injection and Profiling stage to execute the custom LLVM IR function pass (helps in dividing work to different API calls) ✅
Research regrading how the bias vector can be handled within the conv layers of ML programs since it deviates the concept of quantization

For the week of 17-24 July 2024 :

Read Research Paper

Implementation milestones for this week

Gather feature matrix and kernel matrix within the calibration phase for Convolution Layer and Matrix Multiplication Layer ✅
- Gather the Matrix elements from the runtime library (getWandX) - First Milestone ✅
- In the next phase, compute the int matrix multiplication according to the output matrix shape ✅
- Furthermore, replace this resultant matrix with the initial fmul instruction ✅
Implement the DeQuantization within the InjectFault Layer
- After calibration, get the resultant tensor matrix from the LLTFInjectFault functional call ✅
- Overwrite the resultant matrix with DeQuantized Matrix back to the ml program

As a part of the first iteration of Fake Quantization, we are looking to have a native conversion, wherein the Quantization phase, we convert float -> int and in DeQuantization Phase, we convert int -> float ✅

New Update after reading the Research Papers-

Qunatization

Q(r) = Int(r / S)

* Where r is a floating point number (real value), 
* S is the scaling factor, 
* Int is a rounding function that converts the real number to the nearest int value 
* and Q(r) is the Qunatization function

Scaling Factor

S = 2 x max( | r_min |, | r_max| ) / (2^(b - 1) - 1)

* Where r_min is the minimum value found in the input array,
* r_max is the maximum value found in the input array,
* b is the bit width of the qunatized outputs aka the range of qunatized outputs

Dequnatization

r = S x Q(r)

* Where Q(r) is the Qunatized input produced by the first function above,
* S is the Scaling Factor 
* r is the real number output

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation of FakeQuantization in LLTFI #74

Implementation of FakeQuantization in LLTFI #74

ranbir-sharma commented Jul 17, 2024 •

edited

Loading

Implementation of FakeQuantization in LLTFI #74

Implementation of FakeQuantization in LLTFI #74

Comments

ranbir-sharma commented Jul 17, 2024 • edited Loading

Implementation of FakeQuantization in LLTFI

ranbir-sharma commented Jul 17, 2024 •

edited

Loading