Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coefficient Reduction #1

Open
rainyBJ opened this issue Jan 27, 2021 · 3 comments
Open

Coefficient Reduction #1

rainyBJ opened this issue Jan 27, 2021 · 3 comments

Comments

@rainyBJ
Copy link

rainyBJ commented Jan 27, 2021

image
Hi, there! Since every conv operation is followed by a Relu1 function, which can already guarantee the input values to the next layer are from the interval [0:1], I wonder if it is necessary to have the coefficient reduction process. Hoping to have your reply!

@ZFTurbo
Copy link
Owner

ZFTurbo commented Jan 27, 2021

As I remember I keep coefficeints "as is" in this code. But they overflow 7 bits over the 1.0 point.

Some note: quantization method used in this project not really optimal. It's better to use "Symmetric Fine Grained Quantization" which can be found in NVIDIA docs:
https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9659-inference-at-reduced-precision-on-gpus.pdf

@rainyBJ
Copy link
Author

rainyBJ commented Jan 31, 2021

Thanks a lot for your reply!
I've read the NVIDIA doc you mentioned. Symmetric Fine Grained Quantization seems to be a method used in 2 different circumstances,
for activations(feature map, the output for each layer), finding the scale factor for each tensor.
for weight parameter, finding the scale factor for each channel.

However, I'm afraid that using this fine grained granularity may make it difficult for the hardware operations. When using 8 conv modules in Verilog computation, each one corresponding to an output-channel in d-conv/conv. Since the scale factor for each channel is different, making it hard to represent the Multiply And aCcumulate results in a uniform manner. This may also lead to more control signals and a more complicated control logic.

So I'm wondering how to balance the tradeoff between the shorter bit length & more complex control logic.
Hoping to have your reply, best regards!

@ZFTurbo
Copy link
Owner

ZFTurbo commented Jan 31, 2021

Actually you use the same conv operations. The only difference that you need to requantize to new scale after layer calculation complete. But it's just single multiplication and shift. In current quantization method we wasn't able to run model with 8-bit, but in SFGQ it's possible almost without loss of accuracy. In current method we use 12-13 bits for activations and 19-20 bit for weights. It's rather expensive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants