Categories: 未分类

# Activation functions

## Why use activation functions?

make the neural network adapt to non-linear(more complex) pattern

## Different activation functions

• sigmoid(logistic function)

$$\frac{1}{1+e^{-x}}$$

output is constrained between 0 to 1. Only strongly sensitive to their input when x is near 0

• tanh(hyperbolic tangent)

$$\frac{e^x-e^{-x}}{e^x+e^{-x}}$$

output is constrained between -1 to 1

typically perform better than sigmoid as it can normalize the input to “the average of each input variable over the training set is close to zero”, which makes the fast backpropagation.

• ReLU

$$max(0, x)$$

sigmoid and tanh are not easily trained in deep neural networks(gradient vanishing). Good for gradient descent as it’s linear in positive values.

## Reference

1. https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
2. https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function
Categories: 未分类