Knowledge Base - OBSESSLABS | OBSESSLABS

The Mathematics of Decision Making

In a neural network, an Activation Function decides whether a neuron should "fire" or not. But more importantly, it introduces non-linearity. Without it, no matter how many layers you add, the whole network would just be a simple linear regression—incapable of learning complex patterns like faces or speech.

1. The Sigmoid Function

Sigmoid maps any value to a range between 0 and 1. Historically, it was the gold standard because it mimics the firing rate of biological neurons.

Equation: 1 / (1 + e^-x)

Downside: It suffers from the "Vanishing Gradient" problem. For very high or very low inputs, the gradient becomes almost zero, causing the network to stop learning.

2. The Tanh (Hyperbolic Tangent)

Tanh is similar to Sigmoid but maps values between -1 and 1. Being "zero-centered" makes it generally superior to Sigmoid for hidden layers, as it makes the optimization process more stable.

3. ReLU (Rectified Linear Unit)

ReLU is the current industry standard. It’s incredibly simple: if the input is negative, the output is 0. If the input is positive, the output is the same as the input.

Equation: f(x) = max(0, x)

WHY ReLU WINS: It is computationally cheap and doesn't saturate in the positive region, allowing for much faster convergence and deeper networks.

Which to use?

Hidden Layers: Almost always use ReLU (or variants like Leaky ReLU).
Output Layer (Binary): Use Sigmoid to get a probability between 0 and 1.
Output Layer (Multi-class): Use Softmax to get a probability distribution across categories.

Activation Functions The Gates of Neural Networks

Intel_Brief

The Mathematics of Decision Making

1. The Sigmoid Function

2. The Tanh (Hyperbolic Tangent)

3. ReLU (Rectified Linear Unit)

Which to use?

End of Session