A History of Vision: Evolution of CNN Architectures
The Milestones of Vision
The journey of Convolutional Neural Networks (CNNs) is a story of increasing depth and architectural innovation. Here are the key models that changed the field forever.
1. LeNet-5 (1998)
Created by Yann LeCun for handwritten digit recognition, LeNet-5 is where modern CNNs began. It introduced the concepts of alternating convolution and pooling layers that we still use today.
Impact: It powered the system that automated the processing of 10% of all US bank checks.
2. AlexNet (2012)
AlexNet smashed records at the 2012 ImageNet competition. It was much deeper than previous models and was the first to successfully utilize GPUs for training at massive scale.
Key features: Introduced the use of ReLU instead of Tanh and used Dropout to prevent overfitting.
3. VGGNet (2014)
VGG proved that architectural simplicity can lead to great performance. It stack multiple layers of very small 3x3 convolution filters to create deep representations.
Impact: Its "VGG-16" and "VGG-19" variants are still widely used as base models for transfer learning today.
4. ResNet (Residual Network, 2015)
Before ResNet, training very deep networks (100+ layers) was impossible due to the vanishing gradient. Microsoft researchers introduced **Skip Connections** (Residual blocks) that allow the gradient to flow directly through the network.
Impact: It won 1st place in all five ImageNet categories in 2015 and redefined what "deep" learning actually meant.
5. InceptionNet / GoogLeNet (2014)
While VGG went deep, Google's team went wide. Inception uses a "network within a network" approach, applying convolutions of various sizes (1x1, 3x3, 5x5) in parallel at each layer to capture features at different scales.
Key features: Introduced 1x1 convolutions as an efficient tool for feature dimensionality reduction.
Summary Comparison Table
| Model | Epoch | Highlight |
|---|---|---|
| LeNet | 1998 | First standard CNN |
| AlexNet | 2012 | GPU usage & ReLU |
| VGG | 2014 | Only 3x3 Filters |
| Inception | 2014 | Parallel operations |
| ResNet | 2015 | Skip connections (100+ layers) |