Activation functions
Why use activation functions?
make the neural network adapt to non-linear(more complex) pattern
Different activation functions
- sigmoid(logistic function)
$$\frac{1}{1+e^{-x}}$$
output is constrained between 0 to 1. Only strongly sensitive to their input when x is near 0
- tanh(hyperbolic tangent)
$$\frac{e^x-e^{-x}}{e^x+e^{-x}}$$
output is constrained between -1 to 1
typically perform better than sigmoid as it can normalize the input to “the average of each input variable over the training set is close to zero”, which makes the fast backpropagation.
- ReLU
$$max(0, x)$$
sigmoid and tanh are not easily trained in deep neural networks(gradient vanishing). Good for gradient descent as it’s linear in positive values.
Reference
- https://machinelearningmastery.com/rectified-linear-activation-function-for-deep-learning-neural-networks/
- https://stats.stackexchange.com/questions/330559/why-is-tanh-almost-always-better-than-sigmoid-as-an-activation-function
发表评论