- Activation function
- Sigmoid
- Usually used in output layer of a binary classification, where result is either 0 or 1, as value for sigmoid function lies between 0 and 1 only so, result can be predicted easily to be 1 if value is greater than 0.5 and 0 otherwise.
- Tanh
- The activation that works almost always better than sigmoid function is Tanh function also knows as Tangent Hyperbolic function. It’s actually mathematically shifted version of the sigmoid function. Both are similar and can be derived from each other.
- Equation :- f(x) = tanh(x) = 2/(1 + e-2x) – 1 OR tanh(x) = 2 * sigmoid(2x) – 1
- Value Range : -1 to +1
- Nature : non-linear
- Uses : Usually used in hidden layers of a neural network as it’s values lies between -1 to 1 hence the mean for the hidden layer comes out be 0 or very close to it, hence helps in centering the data by bringing mean close to 0. This makes learning for the next layer much easier.
- Diagram :
- Relu
- Stands for Rectified linear unit. It is the most widely used activation function. Chiefly implemented in hidden layers of Neural network.
- Equation : A(x) = max(0,x). It gives an output x if x is positive and 0 otherwise.
- Value Range :- [0, inf)
- Nature :- non-linear, which means we can easily backpropagate the errors and have multiple layers of neurons being activated by the ReLU function.
- Uses :- ReLu is less computationally expensive than tanh and sigmoid because it involves simpler mathematical operations. At a time only a few neurons are activated making the network sparse making it efficient and easy for computation.
- Diagram :
- Leaked Relu
- The leak helps to increase the range of the ReLU function. Usually, the value of a is 0.01 or so.
- Linear Function
- the output of the functions will not be confined between any range.
- SoftMax Function
- The SoftMax function is also a type of sigmoid function but is handy when we are trying to handle classification problems.
- Nature :- non-linear
- Uses :- Usually used when trying to handle multiple classes. The softmax function would squeeze the outputs for each class between 0 and 1 and would also divide by the sum of the outputs.
- Ouput:- The softmax function is ideally used in the output layer of the classifier where we are actually trying to attain the probabilities to define the class of each input.
- Regularization
- Ridge Regressionβ=(X^TX+λI)^1X^TY.β=(XTX+λI)1XTY.
- Lasso
- Loss Functions
- Classification Loss
- log loss
- Focal loss
- Relative Entropy
- Exponential loss
- Hinge loss
- Mean Square Error(L2 loss)
- Mean Absolute Error(L1 loss)
- Huber Loss, Smooth Mean Absolute Error
- Log-Cosh Loss
- Quantile Loss
- Classification Loss
- Optimization Functions
- Gradient Descent
- Stochastic gradient descent
- Nesterov accelerated gradient
- Adagrad
- AdaDelta
- Adam (Best in Class)
- Rmsprop