Activation functions

No Comments

Why use activation functions?

make the neural network adapt to non-linear(more complex) pattern

Different activation functions

  • sigmoid(logistic function)

$$\frac{1}{1+e^{-x}}$$

output is constrained between 0 to 1. Only strongly sensitive to their input when x is near 0

  • tanh(hyperbolic tangent)

$$\frac{e^x-e^{-x}}{e^x+e^{-x}}$$

output is constrained between -1 to 1

typically perform better than sigmoid as it can normalize the input to “the average of each input variable over the training set is close to zero”, which makes the fast backpropagation.

  • ReLU

$$max(0, x)$$

sigmoid and tanh are not easily trained in deep neural networks(gradient vanishing). Good for gradient descent as it’s linear in positive values.

Reference

  1. https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
  2. https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function
Categories: 未分类