Weight initilization

The weight initilize for deep learning depend on the activation and the size of the model (network).

Note:

The code to calculated the parameters of weight initilization formula

# shape is the shape of tensor W, if dense layer, 
# shape is (layer_i nodes, layer_i+1 nodes), 
# else if cnn, shape is (num_blocks, c_input,h_kernel,w_kernel)
n_j = shape[0] if len(shape) == 2 else np.prod(shape[1:])
n_jPlus = shape[1] if len(shape) == 2 else shape[0]

Normally, the input image is normalized to help the model more stable

For saturable nonlinearities (tanh, sigmoid, etc)

If the weight value is too small, the output from activate function will be step by step come to $0$ then it will means nothing.

Else if the weight value is too big, the output from activate function will be saturate as 1 in case tanh and sigmoid $\rightarrow$ vanishing gradient.

$\rightarrow$ Start with uniform distribution in this range will help the model learn faster by make the gradient value be more stable and avoid vanishing/exploring gradient.

Use Xavier initilization, with formula:

$W \thicksim U[-\frac{\sqrt{6}}{\sqrt{n_j+n_{j+1}}}, \frac{\sqrt{6}}{\sqrt{n_j+n_{j+1}}}]$

$U$ is uniform distribution $n_j$ and $n_{j+1}$ was defined above

For ReLu family (ReLu, Leaky ReLu)

If we use normal distirbution for initilizes the parameters, we will get $50%$ of weights equal to zero after ReLu.

Therefore, He introduce new ReLu called PReLu and one way to optimize the initilization with that activation is:

$W \thicksim N[0, \frac{\sqrt{2}}{\sqrt{n_j}}]$ with $N$ is normal distribution.

$PReLU$

$f(y_i)=\begin{cases} y_i &\text{if } y_i>0 \ a_iy_i &\text{if } y_i\leq0 \end{cases}$

where $a_i$ is learnable parameter, Reference

Glorot init paper
Kaiming He paper
Keras code impliment
Albumentations paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weightInit.md

weightInit.md

Weight initilization

For saturable nonlinearities (tanh, sigmoid, etc)

For ReLu family (ReLu, Leaky ReLu)

Files

weightInit.md

Latest commit

History

weightInit.md

File metadata and controls

Weight initilization

For saturable nonlinearities (tanh, sigmoid, etc)

For ReLu family (ReLu, Leaky ReLu)