Coefficient Reduction #1

rainyBJ · 2021-01-27T08:08:51Z

Hi, there! Since every conv operation is followed by a Relu1 function, which can already guarantee the input values to the next layer are from the interval [0:1], I wonder if it is necessary to have the coefficient reduction process. Hoping to have your reply!

ZFTurbo · 2021-01-27T11:50:30Z

As I remember I keep coefficeints "as is" in this code. But they overflow 7 bits over the 1.0 point.

Some note: quantization method used in this project not really optimal. It's better to use "Symmetric Fine Grained Quantization" which can be found in NVIDIA docs:
https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9659-inference-at-reduced-precision-on-gpus.pdf

rainyBJ · 2021-01-31T08:37:21Z

Thanks a lot for your reply!
I've read the NVIDIA doc you mentioned. Symmetric Fine Grained Quantization seems to be a method used in 2 different circumstances,
for activations(feature map, the output for each layer), finding the scale factor for each tensor.
for weight parameter, finding the scale factor for each channel.

However, I'm afraid that using this fine grained granularity may make it difficult for the hardware operations. When using 8 conv modules in Verilog computation, each one corresponding to an output-channel in d-conv/conv. Since the scale factor for each channel is different, making it hard to represent the Multiply And aCcumulate results in a uniform manner. This may also lead to more control signals and a more complicated control logic.

So I'm wondering how to balance the tradeoff between the shorter bit length & more complex control logic.
Hoping to have your reply, best regards!

ZFTurbo · 2021-01-31T09:28:14Z

Actually you use the same conv operations. The only difference that you need to requantize to new scale after layer calculation complete. But it's just single multiplication and shift. In current quantization method we wasn't able to run model with 8-bit, but in SFGQ it's possible almost without loss of accuracy. In current method we use 12-13 bits for activations and 19-20 bit for weights. It's rather expensive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coefficient Reduction #1

Coefficient Reduction #1

rainyBJ commented Jan 27, 2021

ZFTurbo commented Jan 27, 2021

rainyBJ commented Jan 31, 2021

ZFTurbo commented Jan 31, 2021

Coefficient Reduction #1

Coefficient Reduction #1

Comments

rainyBJ commented Jan 27, 2021

ZFTurbo commented Jan 27, 2021

rainyBJ commented Jan 31, 2021

ZFTurbo commented Jan 31, 2021