Skip to content

Latest commit

 

History

History
826 lines (558 loc) · 67.4 KB

README.md

File metadata and controls

826 lines (558 loc) · 67.4 KB


ActTensor: Activation Functions for TensorFlow

license releases

What is it?

ActTensor is a Python package that provides state-of-the-art activation functions which facilitate using them in Deep Learning projects in an easy and fast manner.

Why not using tf.keras.activations?

As you may know, TensorFlow only has a few defined activation functions and most importantly it does not include newly-introduced activation functions. Wrting another one requires time and energy; however, this package has most of the widely-used, and even state-of-the-art activation functions that are ready to use in your models.

Requirements

Install the required dependencies by running the following command:

  • conda env create -f environment.yml

Where to get it?

The source code is currently hosted on GitHub at: https://github.com/pouyaardehkhani/ActTensor

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# PyPI
pip install ActTensor-tf

License

MIT

How to use?

import tensorflow as tf
import numpy as np
from ActTensor_tf import ReLU # name of the layer

functional api

inputs = tf.keras.layers.Input(shape=(28,28))
x = tf.keras.layers.Flatten()(inputs)
x = tf.keras.layers.Dense(128)(x)
# wanted class name
x = ReLU()(x)
output = tf.keras.layers.Dense(10,activation='softmax')(x)

model = tf.keras.models.Model(inputs = inputs,outputs=output)

sequential api

model = tf.keras.models.Sequential([tf.keras.layers.Flatten(),
                                    tf.keras.layers.Dense(128),
                                    # wanted class name
                                    ReLU(),
                                    tf.keras.layers.Dense(10, activation = tf.nn.softmax)])

NOTE:

The main functions of the activation layers are also available, but they may be defined by different names. Check this for more information.

from ActTensor_tf import relu

Activations

Classes and Functions are available in ActTensor_tf

Activation Name Class Name Function Name
SoftShrink SoftShrink softSHRINK
HardShrink HardShrink hard_shrink
GLU GLU -
Bilinear Bilinear -
ReGLU ReGLU -
GeGLU GeGLU -
SwiGLU SwiGLU -
SeGLU SeGLU -
ReLU ReLU relu
Identity Identity identity
Step Step step
Sigmoid Sigmoid sigmoid
HardSigmoid HardSigmoid hard_sigmoid
LogSigmoid LogSigmoid log_sigmoid
SiLU SiLU silu
PLinear ParametricLinear parametric_linear
Piecewise-Linear PiecewiseLinear piecewise_linear
Complementary Log-Log CLL cll
Bipolar Bipolar bipolar
Bipolar-Sigmoid BipolarSigmoid bipolar_sigmoid
Tanh Tanh tanh
TanhShrink TanhShrink tanhshrink
LeCun's Tanh LeCunTanh leCun_tanh
HardTanh HardTanh hard_tanh
TanhExp TanhExp tanh_exp
Absolute ABS Abs
Squared-ReLU SquaredReLU squared_relu
P-ReLU ParametricReLU Parametric_ReLU
R-ReLU RandomizedReLU Randomized_ReLU
LeakyReLU LeakyReLU leaky_ReLU
ReLU6 ReLU6 relu6
Mod-ReLU ModReLU Mod_ReLU
Cosine-ReLU CosReLU Cos_ReLU
Sin-ReLU SinReLU Sin_ReLU
Probit Probit probit
Cos Cos Cosine
Gaussian Gaussian gaussian
Multiquadratic Multiquadratic Multi_quadratic
Inverse-Multiquadratic InvMultiquadratic Inv_Multi_quadratic
SoftPlus SoftPlus softPlus
Mish Mish mish
SMish Smish smish
P-SMish ParametricSmish Parametric_Smish
Swish Swish swish
ESwish ESwish eswish
HardSwish HardSwish hardSwish
GCU GCU gcu
CoLU CoLU colu
PELU PELU pelu
SELU SELU selu
CELU CELU celu
ArcTan ArcTan arcTan
Shifted-SoftPlus ShiftedSoftPlus Shifted_SoftPlus
Softmax Softmax softmax
Logit Logit logit
GELU GELU gelu
Softsign Softsign softsign
ELiSH ELiSH elish
HardELiSH HardELiSH hardELiSH
Serf Serf serf
ELU ELU elu
Phish Phish phish
QReLU QReLU qrelu
m-QReLU MQReLU mqrelu
FReLU FReLU frelu
Activation Name Use Case Pros Cons Example Usage in Known Network
SoftShrink Denoising autoencoders Good for noise reduction Limited usage scenarios Used in image denoising autoencoders
HardShrink Denoising autoencoders Effective noise removal Limited usage scenarios Used in image denoising autoencoders
GLU Gated networks Helps with learning complex functions Requires additional gating mechanism Gated Linear Units in NLP models like ELMo
Bilinear Bilinear interpolation Efficient image processing Not used for non-image data Bilinear interpolation in super-resolution networks
ReGLU Transformer models Enhanced gating mechanism Computationally expensive Enhanced transformer models
GeGLU Transformer models Enhanced gating mechanism Computationally expensive Enhanced transformer models
SwiGLU Transformer models Enhanced gating mechanism Computationally expensive Enhanced transformer models
SeGLU Transformer models Enhanced gating mechanism Computationally expensive Enhanced transformer models
ReLU General purpose Simple, efficient, avoids vanishing gradients Dying ReLU problem Used in almost all CNN architectures like VGG, ResNet
Identity Linear networks Retains input values No non-linearity Identity mapping in residual networks
Step Binary classification Simple thresholding Non-differentiable Used in simple binary classifiers
Sigmoid Binary classification, output layers Smooth gradient, probabilistic interpretation Vanishing gradient problem Output layer in binary classification networks
HardSigmoid Low-power devices Simple and efficient Non-differentiable Mobile networks for power efficiency
LogSigmoid Binary classification, probabilistic outputs Stabilizes training Vanishing gradient problem Binary classification in networks
SiLU Advanced networks Combines ReLU and Sigmoid benefits Computationally expensive Used in Swish-activated networks
PLinear Customizable linear transformation Flexibility Requires parameter tuning Custom layers in experimental networks
Piecewise-Linear Customizable piecewise transformations Flexibility Requires parameter tuning Custom layers in experimental networks
Complementary Log-Log Probabilistic outputs Useful for binary classification Limited use in deep networks Output layers in certain probabilistic models
Bipolar Binary classification Simple bipolar output Non-differentiable Binary classification networks
Bipolar-Sigmoid Binary classification Combines benefits of Sigmoid and Bipolar Vanishing gradient problem Binary classification networks
Tanh Hidden layers Zero-centered output, smooth gradient Vanishing gradient problem RNNs and LSTMs like in original LSTM paper
TanhShrink Denoising autoencoders Combines Tanh with shrinkage Limited usage scenarios Used in denoising autoencoders
LeCun's Tanh Hidden layers Scaled Tanh for better performance Vanishing gradient problem Applied in LeNet-5 network
HardTanh Low-power devices Simple and efficient Non-differentiable Efficient models for mobile devices
TanhExp Advanced networks Combines Tanh and exponential benefits Computationally expensive Experimental deep networks
Absolute Simple tasks Easy to implement Non-differentiable Simple experimental networks
Squared-ReLU Advanced networks Combines ReLU and squaring benefits Computationally expensive Experimental networks with custom activations
P-ReLU Customizable ReLU variant Learnable parameters Requires parameter tuning Variants of ResNet
R-ReLU Regularization Reduces overfitting Computationally expensive Applied in CNNs for added regularization
LeakyReLU General purpose Prevents dying ReLU problem Slightly more computationally expensive than ReLU LeakyReLU in networks like YOLO
ReLU6 Mobile networks Bounded output Dying ReLU problem EfficientNet and MobileNet
Mod-ReLU Advanced networks Combines ReLU and modulation Computationally expensive Custom experimental networks
Cosine-ReLU Advanced networks Combines ReLU and cosine benefits Computationally expensive Custom experimental networks
Sin-ReLU Advanced networks Combines ReLU and sine benefits Computationally expensive Custom experimental networks
Probit Probabilistic outputs Useful for binary classification Limited use in deep networks Certain probabilistic models
Cos Periodic tasks Handles periodicity well Non-differentiable Networks dealing with periodic signals
Gaussian Radial basis functions Smooth gradient, radial basis function Computationally expensive Radial basis function networks
Multiquadratic Radial basis functions Smooth gradient, radial basis function Computationally expensive Radial basis function networks
Inverse-Multiquadratic Radial basis functions Smooth gradient, radial basis function Computationally expensive Radial basis function networks
SoftPlus Advanced networks Smooth approximation to ReLU Computationally expensive Experimental networks
Mish Advanced networks Smooth gradient, non-monotonic Computationally expensive Experimental networks
SMish Advanced networks Smooth gradient, non-monotonic Computationally expensive Experimental networks
P-SMish Customizable Mish variant Learnable parameters Requires parameter tuning Experimental networks
Swish Advanced networks Smooth gradient, non-monotonic Computationally expensive EfficientNet
ESwish Advanced networks Smooth gradient, non-monotonic Computationally expensive Experimental networks
HardSwish Low-power devices Simple and efficient Non-differentiable MobileNetV3
GCU Advanced networks Gradient-controlled units Computationally expensive Experimental networks
CoLU Advanced networks Combines linear and unit step benefits Computationally expensive Experimental networks
PELU Customizable ELU variant Learnable parameters Requires parameter tuning Custom experimental networks
SELU Self-normalizing networks Maintains mean and variance Requires careful initialization and architecture choices Self-normalizing networks like in self-normalizing neural networks paper
CELU Advanced networks Continuously differentiable ELU Computationally expensive Experimental networks
ArcTan Periodic tasks Handles periodicity well Non-differentiable Networks dealing with periodic signals
Shifted-SoftPlus Advanced networks Smooth gradient Computationally expensive Experimental networks
Softmax Output layer for multi-class classification Converts logits to probabilities Not suitable for hidden layers Output layer in classification networks like AlexNet
Logit Probabilistic outputs Useful for binary classification Limited use in deep networks Certain probabilistic models
GELU Advanced networks Combines Gaussian and ReLU benefits Computationally expensive Transformer networks like BERT
Softsign General purpose Smooth approximation to sign function Slower convergence Applied in some RNN architectures
ELiSH Advanced networks Combines ELU and Swish benefits Computationally expensive Experimental networks
HardELiSH Low-power devices Simple and efficient Non-differentiable Efficient models for mobile devices
Serf Advanced networks Combines several benefits of other functions Computationally expensive Experimental networks
ELU Deep networks Smooth gradient, avoids dying ReLU problem Computationally expensive Deep CNNs like in ELU paper
Phish Advanced networks Combines several benefits of other functions Computationally expensive Experimental networks
QReLU Quantized networks Efficient in low-bit precision Less flexible than regular ReLU Efficient quantized networks
MQReLU Quantized networks Efficient in low-bit precision Less flexible than regular ReLU Efficient quantized networks
FReLU Advanced networks Combines ReLU and filter benefits Computationally expensive Experimental networks

Which activation functions it supports?

  1. Soft Shrink:

  1. Hard Shrink:

  1. GLU:

  1. Bilinear:
  1. ReGLU:

    ReGLU is an activation function which is a variant of GLU.

  1. GeGLU:

    GeGLU is an activation function which is a variant of GLU.

  1. SwiGLU:

    SwiGLU is an activation function which is a variant of GLU.

  1. SeGLU:

    SeGLU is an activation function which is a variant of GLU.

  2. ReLU:

  1. Identity:

    $f(x) = x$

  1. Step:

  1. Sigmoid:

  1. Hard Sigmoid:

  1. Log Sigmoid:

  1. SiLU:

  1. ParametricLinear:

    $f(x) = a*x$

  2. PiecewiseLinear:

    Choose some xmin and xmax, which is our "range". Everything less than than this range will be 0, and everything greater than this range will be 1. Anything else is linearly-interpolated between.

  1. Complementary Log-Log (CLL):

  1. Bipolar:

  1. Bipolar Sigmoid:

  1. Tanh:

  1. Tanh Shrink:

  1. LeCunTanh:

  1. Hard Tanh:

  1. TanhExp:

  1. ABS:

  1. SquaredReLU:

  1. ParametricReLU (PReLU):

  1. RandomizedReLU (RReLU):

  1. LeakyReLU:

  1. ReLU6:

  1. ModReLU:

  1. CosReLU:

  1. SinReLU:

  1. Probit:

  1. Cosine:

  1. Gaussian:

  1. Multiquadratic:

    Choose some point (x,y).

  1. InvMultiquadratic:

  1. SoftPlus:

  1. Mish:

  1. Smish:

  1. ParametricSmish (PSmish):

  1. Swish:

  1. ESwish:

  1. Hard Swish:

  1. GCU:

  1. CoLU:

  1. PELU:

  1. SELU:

    where $\alpha \approx 1.6733$ & $\lambda \approx 1.0507$

  1. CELU:

  1. ArcTan:

  1. ShiftedSoftPlus:

  1. Softmax:

  1. Logit:

  1. GELU:

  1. Softsign:

  1. ELiSH:

  1. Hard ELiSH:

  1. Serf:

  1. ELU:

  1. Phish:

  1. QReLU:

  1. modified QReLU (m-QReLU):

  1. FReLU:

Cite this repository

@software{Pouya_ActTensor_2022,
author = {Pouya, Ardehkhani and Pegah, Ardehkhani},
license = {MIT},
month = {7},
title = {{ActTensor}},
url = {https://github.com/pouyaardehkhani/ActTensor},
version = {1.0.0},
year = {2022}
}