Skip to content

Latest commit

 

History

History
261 lines (206 loc) · 8.72 KB

Normalizations.md

File metadata and controls

261 lines (206 loc) · 8.72 KB

Table of Contents

  1. [Batchnorm Forward](#Batchnorm Forward)
  2. [Batchnorm Backward](#Batchnorm Backward)
  3. [Batchnorm Finalize](#Batchnorm Finalize)
  4. [BGenerate Stats](#Generate Stats)

Batchnorm Forward

Batchnorm operation computes: $$ output = scale*{input - mean \over \sqrt{variance + epsilon}} + bias $$

Optionally the operation also computes:

$$next\_running\_mean = (1 - momentum)*previous\_running\_mean + momentum*current\_running\_mean$$ $$next\_running\_variance = (1 - momentum)*previous\_running\_variance + momentum*current\_running\_variance$$

The API to achieve above equations is:

std::array<std::shared_ptr<Tensor_attributes>, 5> batchnorm(std::shared_ptr<Tensor_attributes>& input,
                                                            std::shared_ptr<Tensor_attributes>& scale,
                                                            std::shared_ptr<Tensor_attributes>& bias,
                                                            Batchnorm_attributes attributes); 

where the output array has tensors in order of: [output, saved_mean, saved_invariance, next_running_mean, next_running_variance]

Batchnorm attributes is a lightweight structure with setters for providing optional input tensors and other operation attributes:

Batchnorm_attributes&
set_previous_running_stats(std::shared_ptr<Tensor_attributes>& previous_running_mean,
                            std::shared_ptr<Tensor_attributes>& previous_running_variance,
                            std::shared_ptr<Tensor_attributes>& momentum)

Batchnorm_attributes&
set_name(std::string const&)

Batchnorm_attributes&
set_compute_data_type(DataType_t value)

Python API:

  • batchnorm
    • input
    • scale
    • bias
    • in_running_mean
    • in_running_var
    • epsilon
    • momentum
    • compute_data_type
    • name

Batchnorm Finalize

bn_finalize calculates the statistics required for the next iteration from the statistics generated by the genstat operation.

    std::array<std::shared_ptr<Tensor_attributes>, 6> bn_finalize(std::shared_ptr<Tensor_attributes>,
                                                                  std::shared_ptr<Tensor_attributes>,
                                                                  std::shared_ptr<Tensor_attributes>,
                                                                  std::shared_ptr<Tensor_attributes>,
                                                                  std::shared_ptr<Tensor_attributes>,
                                                                  std::shared_ptr<Tensor_attributes>,
                                                                  BN_finalize_attributes);

with outputs as [EQ_SCALE, EQ_BIAS, MEAN, INV_VARIANCE, NEXT_RUNNING_MEAN, NEXT_RUNNING_VAR]

Batchnorm Backward(DBN)

DBN operation computes data gradient, scale gradient, bias gradient during backpropagation of batchnorm forward operation.

The API to achieve above is:

std::array<std::shared_ptr<Tensor_attributes>, 3> batchnorm_backward(std::shared_ptr<Tensor_attributes> loss,
                                                                         std::shared_ptr<Tensor_attributes> input,
                                                                         std::shared_ptr<Tensor_attributes> scale,
                                                                         Batchnorm_backward_attributes);

where the output array has tensors in order of: [input gradient, scale gradient, bias gradient].

DBN attributes is a lightweight structure with setters:

Batchnorm_backward_attributes&
set_saved_mean_and_inv_variance(std::shared_ptr<Tensor_attributes> saved_mean,
                                std::shared_ptr<Tensor_attributes> saved_inverse_variance)
                                
Batchnorm_backward_attributes&
set_epsilon(std::shared_ptr<Tensor_attributes> epsilon)

Batchnorm_backward_attributes&
set_name(std::string const&)

Batchnorm_backward_attributes&
set_compute_data_type(DataType_t value)

Only setting either (saved mean and inverse_variance) or (epsilon) is necessary.

Generate Stats

Genstats operation computes sum and sum of squares per-channel dimension.

The API to achieve above is:

std::array<std::shared_ptr<Tensor_attributes>, 2>
cudnn_frontend::graph::genstats(std::shared_ptr<Tensor_attributes>, Genstats_attributes);

where the output array has tensors in order of: [sum, square_sum]

Genstats attributes is a lightweight structure with setters:

Genstats_attributes&
set_name(std::string const&)

Genstats_attributes&
set_compute_data_type(DataType_t value)

Python API:

  • genstats
    • input
    • compute_data_type
    • name

Layernorm Forward

Layer norm computes

$$ output = scale*{input - mean \over \sqrt{variance + epsilon}} + bias $$

where normalization happens across features, independently for each sample.

The API to achieve above equations is:

std::array<std::shared_ptr<Tensor_attributes>, 3> layernorm(std::shared_ptr<Tensor_attributes>& input,
                                                            std::shared_ptr<Tensor_attributes>& scale,
                                                            std::shared_ptr<Tensor_attributes>& bias,
                                                            Layernorm_attributes attributes); 

where the output array has tensors in order of: [output, mean, variance]

Layernorm_attributes is a lightweight structure with setters for providing optional input tensors and other operation attributes:

Layernorm_attributes&
set_name(std::string const&)

Layernorm_attributes&
set_compute_data_type(DataType_t value)

Python API:

  • layernorm
    • norm_forward_phase
    • input
    • scale
    • bias
    • epsilon
    • compute_data_type
    • name

Layernorm Backward

DLN operation computes data gradient, scale gradient, bias gradient during backpropagation of layernorm forward operation.

The API to achieve above is:

std::array<std::shared_ptr<Tensor_attributes>, 3>
            layernorm_backward(std::shared_ptr<Tensor_attributes> dy,
                          std::shared_ptr<Tensor_attributes> x,
                          std::shared_ptr<Tensor_attributes> scale,
                          Layernorm_backward_attributes options);

where the output array has tensors in order of: [input gradient, scale gradient, bias gradient].

Layernorm_attributes is a lightweight structure with setters for providing optional input tensors and other operation attributes:

Layernorm_attributes&
set_name(std::string const&)

Layernorm_attributes&
set_compute_data_type(DataType_t value)

Python API:

  • layernorm
    • input
    • scale
    • loss
    • compute_data_type
    • name

Instancenorm Forward

Instance norm computes

$$ output = scale*{input - mean \over \sqrt{variance + epsilon}} + bias $$

where normalization happens across each sample.

The API to achieve above equations is:

std::array<std::shared_ptr<Tensor_attributes>, 3> instancenorm(std::shared_ptr<Tensor_attributes>& input,
                                                            std::shared_ptr<Tensor_attributes>& scale,
                                                            std::shared_ptr<Tensor_attributes>& bias,
                                                            Instancenorm_attributes attributes); 

where the output array has tensors in order of: [output, mean, variance]

Instancenorm_attributes is a lightweight structure with setters for providing optional input tensors and other operation attributes:

Instancenorm_attributes&
set_name(std::string const&)

Instancenorm_attributes&
set_compute_data_type(DataType_t value)

Python API:

  • instancenorm
    • norm_forward_phase
    • input
    • scale
    • bias
    • epsilon
    • compute_data_type
    • name

Instancenorm Backward

DIN operation computes data gradient, scale gradient, bias gradient during backpropagation of instancenorm forward operation.

The API to achieve above is:

std::array<std::shared_ptr<Tensor_attributes>, 3>
            instancenorm_backward(std::shared_ptr<Tensor_attributes> dy,
                          std::shared_ptr<Tensor_attributes> x,
                          std::shared_ptr<Tensor_attributes> scale,
                          Instancenorm_backward_attributes options);

where the output array has tensors in order of: [input gradient, scale gradient, bias gradient].

Instancenorm_attributes is a lightweight structure with setters for providing optional input tensors and other operation attributes:

Instancenorm_attributes&
set_name(std::string const&)

Instancenorm_attributes&
set_compute_data_type(DataType_t value)

Python API:

  • layernorm
    • input
    • scale
    • loss
    • compute_data_type
    • name