[ 2024.04.02 / 10 min read ]
Deep Learning

A History of Vision: Evolution of CNN Architectures

The Milestones of Vision

The journey of Convolutional Neural Networks (CNNs) is a story of increasing depth and architectural innovation. Here are the key models that changed the field forever.

1. LeNet-5 (1998)

Created by Yann LeCun for handwritten digit recognition, LeNet-5 is where modern CNNs began. It introduced the concepts of alternating convolution and pooling layers that we still use today.

Impact: It powered the system that automated the processing of 10% of all US bank checks.

2. AlexNet (2012)

AlexNet smashed records at the 2012 ImageNet competition. It was much deeper than previous models and was the first to successfully utilize GPUs for training at massive scale.

Key features: Introduced the use of ReLU instead of Tanh and used Dropout to prevent overfitting.

3. VGGNet (2014)

VGG proved that architectural simplicity can lead to great performance. It stack multiple layers of very small 3x3 convolution filters to create deep representations.

Impact: Its "VGG-16" and "VGG-19" variants are still widely used as base models for transfer learning today.

ARCHITECTURE TRUTH: VGG showed that deep networks (16+ layers) with small filters outperform shallow ones with large filters.

4. ResNet (Residual Network, 2015)

Before ResNet, training very deep networks (100+ layers) was impossible due to the vanishing gradient. Microsoft researchers introduced **Skip Connections** (Residual blocks) that allow the gradient to flow directly through the network.

Impact: It won 1st place in all five ImageNet categories in 2015 and redefined what "deep" learning actually meant.

5. InceptionNet / GoogLeNet (2014)

While VGG went deep, Google's team went wide. Inception uses a "network within a network" approach, applying convolutions of various sizes (1x1, 3x3, 5x5) in parallel at each layer to capture features at different scales.

Key features: Introduced 1x1 convolutions as an efficient tool for feature dimensionality reduction.

Summary Comparison Table

Model Epoch Highlight
LeNet 1998 First standard CNN
AlexNet 2012 GPU usage & ReLU
VGG 2014 Only 3x3 Filters
Inception 2014 Parallel operations
ResNet 2015 Skip connections (100+ layers)