-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coefficient Reduction #1
Comments
As I remember I keep coefficeints "as is" in this code. But they overflow 7 bits over the 1.0 point. Some note: quantization method used in this project not really optimal. It's better to use "Symmetric Fine Grained Quantization" which can be found in NVIDIA docs: |
Thanks a lot for your reply! However, I'm afraid that using this fine grained granularity may make it difficult for the hardware operations. When using 8 conv modules in Verilog computation, each one corresponding to an output-channel in d-conv/conv. Since the scale factor for each channel is different, making it hard to represent the Multiply And aCcumulate results in a uniform manner. This may also lead to more control signals and a more complicated control logic. So I'm wondering how to balance the tradeoff between the shorter bit length & more complex control logic. |
Actually you use the same conv operations. The only difference that you need to requantize to new scale after layer calculation complete. But it's just single multiplication and shift. In current quantization method we wasn't able to run model with 8-bit, but in SFGQ it's possible almost without loss of accuracy. In current method we use 12-13 bits for activations and 19-20 bit for weights. It's rather expensive. |
Hi, there! Since every conv operation is followed by a Relu1 function, which can already guarantee the input values to the next layer are from the interval [0:1], I wonder if it is necessary to have the coefficient reduction process. Hoping to have your reply!
The text was updated successfully, but these errors were encountered: