Convolutional Neural Networks

Draft 1 min read

Motivation

Fully connected layers don’t exploit spatial structure. CNNs use local connectivity and weight sharing to efficiently process images.

Convolution Operation

A 2D convolution slides a kernel over the input:

(fg)(i,j)=mnf(m,n)g(im,jn)(f * g)(i, j) = \sum_m \sum_n f(m, n) \cdot g(i - m, j - n)

Key Components

Convolutional Layer

Applies learnable filters to extract features (edges, textures, shapes).

Pooling Layer

Reduces spatial dimensions. Max pooling takes the maximum value in each window.

Batch Normalization

Normalizes activations to stabilize training.

Classic Architectures

  • LeNet (1998): pioneered CNNs for digit recognition
  • AlexNet (2012): deeper, used ReLU and dropout
  • VGG (2014): uniform 3x3 convolutions, very deep
  • ResNet (2015): skip connections, enabled 100+ layer networks

PyTorch Example

import torch.nn as nn

model = nn.Sequential(
    nn.Conv2d(1, 32, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Conv2d(32, 64, kernel_size=3, padding=1),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Flatten(),
    nn.Linear(64 * 7 * 7, 10),
)

Related Notes

Other notes in the same chapter or with shared tags