Convolutional Neural Networks (CNNs) ==================================== .. figure:: ./figures/cnn.png :alt: Principles of CNNs :width: 15cm Principles of CNNs Sources: - `3Blue1Brown video: Convolutions in Image Processing `__ - `far1din video: Convolutional Neural Networks from Scratch `__ - `What is a Convolutional Neural Network? `__. - CNN `Stanford cs231n `__ - Deep learning `Stanford cs231n `__ - Pytorch - `WWW tutorials `__ - `github tutorials `__ - `github examples `__ - MNIST and pytorch: - `MNIST nextjournal.com/gkoehler/pytorch-mnist `__ - `MNIST github/pytorch/examples `__ - `MNIST kaggle `__ Introduction to CNNs -------------------- CNNs are deep learning architectures designed for processing grid-like data such as images. Inspired by the biological visual cortex, they learn hierarchical feature representations, making them effective for tasks like image classification, object detection, and segmentation. Key Principles of CNNs: - **Convolutional Layers** are the core building block of a CNN, which applies a convolution operation to the input, passing the result to the next layer: it perform feature extraction using learnable filters (kernels), allowing CNNs to detect local patterns such as edges and textures. - **Activation Functions** introduce non-linearity into the model, enabling the network to learn complex patterns. ReLU (Rectified Linear Unit) is the most commonly used activation function, improving training speed and mitigating vanishing gradients. Possible function are Tanh or Sigmoid and most commonly used the **ReLu(Rectified Linear Unit** function. ReLu accelerate the training because the derivative of sigmoid becomes very small in the saturating region and therefore the updates to the weights almost vanish. This is called **vanishing gradient problem**.. - **Pooling Layers** reduces the spatial dimensions (height and width) of the input feature maps by downsampling the input feature maps summarizing the presence of features in patches of the feature map. Max pooling and average pooling are the most common functions. - **Fully Connected Layers** flatten extracted features and connects to a classifier, typically a softmax layer for classification tasks. - **Dropout**: reduces the over-fitting by using a Dropout layer after every FC layer. Dropout layer has a probability,(p), associated with it and is applied at every neuron of the response map separately. It randomly switches off the activation with the probability p. - **Batch Normalization** normalizes the inputs of each layer to have a mean of zero and a variance of one, which improve network stability. This normalization is performed for each mini-batch during training. CNN Architectures: Evolution from LeNet to ResNet ------------------------------------------------- LeNet-5 (1998) ~~~~~~~~~~~~~~ First successful CNN for handwritten digit recognition. .. figure:: ./figures/LeNet_Original_Image.jpg :alt: LeNet LeNet AlexNet (2012) ~~~~~~~~~~~~~~ Revolutionized deep learning by winning the ImageNet competition. Introduced ReLU activation, dropout, and GPU acceleration. Featured Convolutional Layers stacked on top of each other (previously it was common to only have a single CONV layer always immediately followed by a POOL layer). |AlexNet| |AlexNet architecture| VGG (2014) ~~~~~~~~~~ Introduced a simple yet deep architecture with 3×3 convolutions. |VGGNet| |VGGNet architecture| GoogLeNet (Inception) (2014) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Introduced the **Inception module**, using multiple kernel sizes in parallel. ResNet (2015) ~~~~~~~~~~~~~ Introduced **skip connections**, allowing training of very deep networks. .. figure:: ./figures/resnets_modelvariants.png :alt: ResNet block :width: 10cm ResNet block |ResNet 18| |ResNet 18 architecture| Architectures general guidelines ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ConvNets stack CONV,POOL,FC layers - Trend towards smaller filters and deeper architectures: stack 3x3, instead of 5x5 - Trend towards getting rid of POOL/FC layers (just CONV) - Historically architectures looked like [(CONV-RELU) x N POOL?] x M (FC-RELU) x K, SOFTMAX where N is usually up to ~5, M is large, 0 <= K <= 2. - But recent advances such as ResNet/GoogLeNet have challenged this paradigm Conclusion and Further Topics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - **Recent architectures:** EfficientNet, Vision Transformers (ViTs), MobileNet for edge devices. - **Advanced topics:** Transfer learning, object detection (YOLO, Faster R-CNN), segmentation (U-Net). - **Hands-on implementation:** Implement CNNs using TensorFlow/PyTorch for real-world applications. .. |AlexNet| image:: ./figures/alexnet.png :width: 7cm .. |AlexNet architecture| image:: ./figures/alexnet_param_tab.png :width: 7cm .. |VGGNet| image:: ./figures/vgg.png :width: 7cm .. |VGGNet architecture| image:: ./figures/vgg_param_tab.png :width: 7cm .. |ResNet 18| image:: ./figures/resnet18.png :width: 7cm .. |ResNet 18 architecture| image:: ./figures/resnet_param_tab.png :width: 10cm Training function ----------------- .. code:: ipython3 %matplotlib inline import os import numpy as np from pathlib import Path # ML import torch import torch.nn as nn import torch.optim as optim from torch.optim import lr_scheduler import torchvision import torchvision.transforms as transforms from torchvision import models # Device configuration device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # device = 'cpu' # Force CPU # print(device) # Plot import matplotlib.pyplot as plt import seaborn as sns # Plot parameters plt.style.use('seaborn-v0_8-whitegrid') fig_w, fig_h = plt.rcParams.get('figure.figsize') plt.rcParams['figure.figsize'] = (fig_w, fig_h * .5) %matplotlib inline See `train_val_model `__ function. .. code:: ipython3 from pystatsml.dl_utils import train_val_model CNN in PyTorch -------------- LeNet-5 ~~~~~~~ Here we implement LeNet-5 with relu activation. Sources: - `Github: LeNet-5-PyTorch `__, - `Kaggle: lenet with pytorch `__. .. code:: ipython3 import torch.nn as nn import torch.nn.functional as F class LeNet5(nn.Module): """ layers: (nb channels in input layer, nb channels in 1rst conv, nb channels in 2nd conv, nb neurons for 1rst FC: TO BE TUNED, nb neurons for 2nd FC, nb neurons for 3rd FC, nb neurons output FC TO BE TUNED) """ def __init__(self, layers = (1, 6, 16, 1024, 120, 84, 10), debug=False): super(LeNet5, self).__init__() self.layers = layers self.debug = debug self.conv1 = nn.Conv2d(layers[0], layers[1], 5, padding=2) self.conv2 = nn.Conv2d(layers[1], layers[2], 5) self.fc1 = nn.Linear(layers[3], layers[4]) self.fc2 = nn.Linear(layers[4], layers[5]) self.fc3 = nn.Linear(layers[5], layers[6]) def forward(self, x): x = F.max_pool2d(F.relu(self.conv1(x)), 2) # same shape / 2 x = F.max_pool2d(F.relu(self.conv2(x)), 2) # -4 / 2 if self.debug: print("### DEBUG: Shape of last convnet=", x.shape[1:], ". FC size=", np.prod(x.shape[1:])) x = x.view(-1, self.layers[3]) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1) VGGNet like: conv-relu blocks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. code:: ipython3 # Defining the network (LeNet-5) import torch.nn as nn import torch.nn.functional as F class MiniVGGNet(torch.nn.Module): def __init__(self, layers=(1, 16, 32, 1024, 120, 84, 10), debug=False): super(MiniVGGNet, self).__init__() self.layers = layers self.debug = debug # Conv block 1 self.conv11 = nn.Conv2d(in_channels=layers[0],out_channels=layers[1], kernel_size=3, stride=1, padding=0, bias=True) self.conv12 = nn.Conv2d(in_channels=layers[1], out_channels=layers[1], kernel_size=3, stride=1, padding=0, bias=True) # Conv block 2 self.conv21 = nn.Conv2d(in_channels=layers[1], out_channels=layers[2], kernel_size=3, stride=1, padding=0, bias=True) self.conv22 = nn.Conv2d(in_channels=layers[2], out_channels=layers[2], kernel_size=3, stride=1, padding=1, bias=True) # Fully connected layer self.fc1 = nn.Linear(layers[3], layers[4]) self.fc2 = nn.Linear(layers[4], layers[5]) self.fc3 = nn.Linear(layers[5], layers[6]) def forward(self, x): x = F.relu(self.conv11(x)) x = F.relu(self.conv12(x)) x = F.max_pool2d(x, 2) x = F.relu(self.conv21(x)) x = F.relu(self.conv22(x)) x = F.max_pool2d(x, 2) if self.debug: print("### DEBUG: Shape of last convnet=", x.shape[1:], ". FC size=", np.prod(x.shape[1:])) x = x.view(-1, self.layers[3]) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return F.log_softmax(x, dim=1) ResNet-like Model ~~~~~~~~~~~~~~~~~ Stack multiple resnet blocks .. code:: ipython3 # ---------------------------------------------------------------------------- # # An implementation of https://arxiv.org/pdf/1512.03385.pdf # # See section 4.2 for the model architecture on CIFAR-10 # # Some part of the code was referenced from below # # https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py # # ---------------------------------------------------------------------------- # import torch.nn as nn # 3x3 convolution def conv3x3(in_channels, out_channels, stride=1): return nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False) # Residual block class ResidualBlock(nn.Module): def __init__(self, in_channels, out_channels, stride=1, downsample=None): super(ResidualBlock, self).__init__() self.conv1 = conv3x3(in_channels, out_channels, stride) self.bn1 = nn.BatchNorm2d(out_channels) self.relu = nn.ReLU(inplace=True) self.conv2 = conv3x3(out_channels, out_channels) self.bn2 = nn.BatchNorm2d(out_channels) self.downsample = downsample def forward(self, x): residual = x out = self.conv1(x) out = self.bn1(out) out = self.relu(out) out = self.conv2(out) out = self.bn2(out) if self.downsample: residual = self.downsample(x) out += residual out = self.relu(out) return out # ResNet class ResNet(nn.Module): def __init__(self, block, layers, num_classes=10): super(ResNet, self).__init__() self.in_channels = 16 self.conv = conv3x3(3, 16) self.bn = nn.BatchNorm2d(16) self.relu = nn.ReLU(inplace=True) self.layer1 = self.make_layer(block, 16, layers[0]) self.layer2 = self.make_layer(block, 32, layers[1], 2) self.layer3 = self.make_layer(block, 64, layers[2], 2) self.avg_pool = nn.AvgPool2d(8) self.fc = nn.Linear(64, num_classes) def make_layer(self, block, out_channels, blocks, stride=1): downsample = None if (stride != 1) or (self.in_channels != out_channels): downsample = nn.Sequential( conv3x3(self.in_channels, out_channels, stride=stride), nn.BatchNorm2d(out_channels)) layers = [] layers.append(block(self.in_channels, out_channels, stride, downsample)) self.in_channels = out_channels for i in range(1, blocks): layers.append(block(out_channels, out_channels)) return nn.Sequential(*layers) def forward(self, x): out = self.conv(x) out = self.bn(out) out = self.relu(out) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = self.avg_pool(out) out = out.view(out.size(0), -1) out = self.fc(out) return F.log_softmax(out, dim=1) #return out ResNet9 ~~~~~~~ Sources: - `DAWNBench on cifar10 `__ - `ResNet9: train to 94% CIFAR10 accuracy in 100 seconds `__ Classification: MNIST digits ---------------------------- `MNIST Loader `__ .. code:: ipython3 from pystatsml.datasets import load_mnist_pytorch dataloaders, WD = load_mnist_pytorch( batch_size_train=64, batch_size_test=10000) os.makedirs(os.path.join(WD, "models"), exist_ok=True) # Info about the dataset D_in = np.prod(dataloaders["train"].dataset.data.shape[1:]) D_out = len(dataloaders["train"].dataset.targets.unique()) print("Datasets shapes:", { x: dataloaders[x].dataset.data.shape for x in ['train', 'test']}) print("N input features:", D_in, "Output classes:", D_out) .. parsed-literal:: /home/ed203246/data/pystatml/dl_mnist_pytorch :: --------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[12], line 8 5 os.makedirs(os.path.join(WD, "models"), exist_ok=True) 7 # Info about the dataset ----> 8 D_in = np.prod(dataloaders["train"].dataset.data.shape[1:]) 9 D_out = len(dataloaders["train"].dataset.targets.unique()) 10 print("Datasets shapes:", { 11 x: dataloaders[x].dataset.data.shape for x in ['train', 'test']}) NameError: name 'np' is not defined LeNet ~~~~~ Dry run in debug mode to get the shape of the last convnet layer. .. code:: ipython3 model = LeNet5((1, 6, 16, 1, 120, 84, 10), debug=True) batch_idx, (data_example, target_example) = next( enumerate(dataloaders["train"])) print(model) _ = model(data_example) .. parsed-literal:: LeNet5( (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=1, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) ) ### DEBUG: Shape of last convnet= torch.Size([16, 5, 5]) . FC size= 400 Set First FC layer to 400 .. code:: ipython3 model = LeNet5((1, 6, 16, 400, 120, 84, 10)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) criterion = nn.NLLLoss() # Explore the model for parameter in model.parameters(): print(parameter.shape) print("Total number of parameters =", np.sum([np.prod(parameter.shape) for parameter in model.parameters()])) model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=5, log_interval=2) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: torch.Size([6, 1, 5, 5]) torch.Size([6]) torch.Size([16, 6, 5, 5]) torch.Size([16]) torch.Size([120, 400]) torch.Size([120]) torch.Size([84, 120]) torch.Size([84]) torch.Size([10, 84]) torch.Size([10]) Total number of parameters = 61706 Epoch 0/4 ---------- train Loss: 0.8882 Acc: 72.55% val Loss: 0.1889 Acc: 94.00% Epoch 2/4 ---------- train Loss: 0.0865 Acc: 97.30% val Loss: 0.0592 Acc: 98.07% Epoch 4/4 ---------- train Loss: 0.0578 Acc: 98.22% val Loss: 0.0496 Acc: 98.45% Training complete in 0m 55s Best val Acc: 98.45% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_19_1.png MiniVGGNet ~~~~~~~~~~ .. code:: ipython3 model = MiniVGGNet(layers=(1, 16, 32, 1, 120, 84, 10), debug=True) print(model) _ = model(data_example) .. parsed-literal:: MiniVGGNet( (conv11): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1)) (conv12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1)) (conv21): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) (conv22): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fc1): Linear(in_features=1, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) ) ### DEBUG: Shape of last convnet= torch.Size([32, 5, 5]) . FC size= 800 Set First FC layer to 800 .. code:: ipython3 model = MiniVGGNet((1, 16, 32, 800, 120, 84, 10)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) criterion = nn.NLLLoss() # Explore the model for parameter in model.parameters(): print(parameter.shape) print("Total number of parameters =", np.sum([np.prod(parameter.shape) for parameter in model.parameters()])) model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=5, log_interval=2) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: torch.Size([16, 1, 3, 3]) torch.Size([16]) torch.Size([16, 16, 3, 3]) torch.Size([16]) torch.Size([32, 16, 3, 3]) torch.Size([32]) torch.Size([32, 32, 3, 3]) torch.Size([32]) torch.Size([120, 800]) torch.Size([120]) torch.Size([84, 120]) torch.Size([84]) torch.Size([10, 84]) torch.Size([10]) Total number of parameters = 123502 Epoch 0/4 ---------- train Loss: 1.2111 Acc: 58.85% val Loss: 0.1599 Acc: 94.67% Epoch 2/4 ---------- train Loss: 0.0781 Acc: 97.58% val Loss: 0.0696 Acc: 97.75% Epoch 4/4 ---------- train Loss: 0.0493 Acc: 98.48% val Loss: 0.0420 Acc: 98.62% Training complete in 2m 9s Best val Acc: 98.62% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_23_1.png Reduce the size of training dataset ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Reduce the size of the training dataset by considering only ``10`` minibatche for size\ ``16``. .. code:: ipython3 train_loader, val_loader = dataloaders["train"], dataloaders["test"] train_size = 10 * 16 # Stratified sub-sampling targets = train_loader.dataset.targets.numpy() nclasses = len(set(targets)) indices = np.concatenate([np.random.choice(np.where(targets == lab)[0], int(train_size / nclasses), replace=False) for lab in set(targets)]) np.random.shuffle(indices) train_loader = torch.utils.data.DataLoader(train_loader.dataset, batch_size=16, sampler=torch.utils.data.SubsetRandomSampler(indices)) # Check train subsampling train_labels = np.concatenate([labels.numpy() for inputs, labels in train_loader]) print("Train size=", len(train_labels), " Train label count=", {lab: np.sum(train_labels == lab) for lab in set(train_labels)}) print("Batch sizes=", [inputs.size(0) for inputs, labels in train_loader]) # Put together train and val dataloaders = dict(train=train_loader, val=val_loader) # Info about the dataset data_shape = dataloaders["train"].dataset.data.shape[1:] D_in = np.prod(data_shape) D_out = len(dataloaders["train"].dataset.targets.unique()) print("Datasets shape", {x: dataloaders[x].dataset.data.shape for x in ['train', 'val']}) print("N input features", D_in, "N output", D_out) .. parsed-literal:: Train size= 160 Train label count= {np.int64(0): np.int64(16), np.int64(1): np.int64(16), np.int64(2): np.int64(16), np.int64(3): np.int64(16), np.int64(4): np.int64(16), np.int64(5): np.int64(16), np.int64(6): np.int64(16), np.int64(7): np.int64(16), np.int64(8): np.int64(16), np.int64(9): np.int64(16)} Batch sizes= [16, 16, 16, 16, 16, 16, 16, 16, 16, 16] Datasets shape {'train': torch.Size([60000, 28, 28]), 'val': torch.Size([10000, 28, 28])} N input features 784 N output 10 LeNet5 .. code:: ipython3 model = LeNet5((1, 6, 16, 400, 120, 84, D_out)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) criterion = nn.NLLLoss() model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=100, log_interval=20) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/99 ---------- train Loss: 2.3072 Acc: 7.50% val Loss: 2.3001 Acc: 8.89% Epoch 20/99 ---------- train Loss: 0.4810 Acc: 83.75% val Loss: 0.7552 Acc: 72.66% Epoch 40/99 ---------- train Loss: 0.1285 Acc: 95.62% val Loss: 0.6663 Acc: 81.72% Epoch 60/99 ---------- train Loss: 0.0065 Acc: 100.00% val Loss: 0.6982 Acc: 84.26% Epoch 80/99 ---------- train Loss: 0.0032 Acc: 100.00% val Loss: 0.7571 Acc: 84.26% Training complete in 1m 37s Best val Acc: 84.34% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_27_1.png MiniVGGNet .. code:: ipython3 model = MiniVGGNet((1, 16, 32, 800, 120, 84, 10)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) criterion = nn.NLLLoss() model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=100, log_interval=20) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/99 ---------- train Loss: 2.3048 Acc: 10.00% val Loss: 2.3026 Acc: 10.28% Epoch 20/99 ---------- train Loss: 2.2865 Acc: 26.25% val Loss: 2.2861 Acc: 23.22% Epoch 40/99 ---------- train Loss: 0.3847 Acc: 85.00% val Loss: 0.8042 Acc: 75.76% Epoch 60/99 ---------- train Loss: 0.0047 Acc: 100.00% val Loss: 0.8659 Acc: 83.57% Epoch 80/99 ---------- train Loss: 0.0013 Acc: 100.00% val Loss: 1.0183 Acc: 83.39% Training complete in 4m 39s Best val Acc: 84.01% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_29_1.png Classification: CIFAR-10 dataset with 10 classes ------------------------------------------------ The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. `Source Yunjey Choi Github pytorch tutorial `__ Load CIFAR-10 dataset `CIFAR-10 Loader `__ .. code:: ipython3 from pystatsml.datasets import load_cifar10_pytorch dataloaders, _ = load_cifar10_pytorch( batch_size_train=100, batch_size_test=100) # Info about the dataset D_in = np.prod(dataloaders["train"].dataset.data.shape[1:]) D_out = len(set(dataloaders["train"].dataset.targets)) print("Datasets shape:", { x: dataloaders[x].dataset.data.shape for x in dataloaders.keys()}) print("N input features:", D_in, "N output:", D_out) LeNet ~~~~~ .. code:: ipython3 model = LeNet5((3, 6, 16, 1, 120, 84, D_out), debug=True) batch_idx, (data_example, target_example) = next(enumerate(train_loader)) print(model) _ = model(data_example) .. parsed-literal:: LeNet5( (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2)) (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1)) (fc1): Linear(in_features=1, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) ) ### DEBUG: Shape of last convnet= torch.Size([16, 6, 6]) . FC size= 576 Set 576 neurons to the first FC layer SGD with momentum ``lr=0.001, momentum=0.5`` .. code:: ipython3 model = LeNet5((3, 6, 16, 576, 120, 84, D_out)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.5) criterion = nn.NLLLoss() # Explore the model for parameter in model.parameters(): print(parameter.shape) print("Total number of parameters =", np.sum([np.prod(parameter.shape) for parameter in model.parameters()])) model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25, log_interval=5) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: torch.Size([6, 3, 5, 5]) torch.Size([6]) torch.Size([16, 6, 5, 5]) torch.Size([16]) torch.Size([120, 576]) torch.Size([120]) torch.Size([84, 120]) torch.Size([84]) torch.Size([10, 84]) torch.Size([10]) Total number of parameters = 83126 Epoch 0/24 ---------- train Loss: 2.3037 Acc: 10.06% val Loss: 2.3032 Acc: 10.05% Epoch 5/24 ---------- train Loss: 2.3005 Acc: 10.72% val Loss: 2.2998 Acc: 10.61% Epoch 10/24 ---------- train Loss: 2.2931 Acc: 11.90% val Loss: 2.2903 Acc: 11.27% Epoch 15/24 ---------- train Loss: 2.2355 Acc: 16.46% val Loss: 2.2134 Acc: 17.75% Epoch 20/24 ---------- train Loss: 2.1804 Acc: 19.07% val Loss: 2.1579 Acc: 20.26% Training complete in 5m 13s Best val Acc: 23.19% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_37_1.png Increase learning rate and momentum ``lr=0.01, momentum=0.9`` .. code:: ipython3 model = LeNet5((3, 6, 16, 576, 120, 84, D_out)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) criterion = nn.NLLLoss() model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25, log_interval=5) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/24 ---------- train Loss: 2.1798 Acc: 17.53% val Loss: 1.9141 Acc: 31.27% Epoch 5/24 ---------- train Loss: 1.3804 Acc: 49.93% val Loss: 1.3098 Acc: 53.23% Epoch 10/24 ---------- train Loss: 1.2019 Acc: 56.79% val Loss: 1.0886 Acc: 60.91% Epoch 15/24 ---------- train Loss: 1.1043 Acc: 60.61% val Loss: 1.0321 Acc: 63.26% Epoch 20/24 ---------- train Loss: 1.0569 Acc: 62.31% val Loss: 0.9942 Acc: 65.55% Training complete in 5m 15s Best val Acc: 67.18% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_39_1.png Adaptative learning rate: Adam .. code:: ipython3 model = LeNet5((3, 6, 16, 576, 120, 84, D_out)).to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) criterion = nn.NLLLoss() model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25, log_interval=5) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/24 ---------- train Loss: 1.8866 Acc: 29.71% val Loss: 1.6111 Acc: 40.21% Epoch 5/24 ---------- train Loss: 1.3877 Acc: 49.62% val Loss: 1.3016 Acc: 53.23% Epoch 10/24 ---------- train Loss: 1.2274 Acc: 55.93% val Loss: 1.1575 Acc: 58.78% Epoch 15/24 ---------- train Loss: 1.1399 Acc: 59.28% val Loss: 1.0712 Acc: 61.84% Epoch 20/24 ---------- train Loss: 1.0806 Acc: 61.62% val Loss: 1.0334 Acc: 62.69% Training complete in 5m 25s Best val Acc: 65.14% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_41_1.png MiniVGGNet ~~~~~~~~~~ .. code:: ipython3 model = MiniVGGNet(layers=(3, 16, 32, 1, 120, 84, D_out), debug=True) print(model) _ = model(data_example) .. parsed-literal:: MiniVGGNet( (conv11): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1)) (conv12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1)) (conv21): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1)) (conv22): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)) (fc1): Linear(in_features=1, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) ) ### DEBUG: Shape of last convnet= torch.Size([32, 6, 6]) . FC size= 1152 Set 1152 neurons to the first FC layer SGD with large momentum and learning rate .. code:: ipython3 model = MiniVGGNet((3, 16, 32, 1152, 120, 84, D_out)).to(device) optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) criterion = nn.NLLLoss() model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25, log_interval=5) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/24 ---------- train Loss: 2.2581 Acc: 13.96% val Loss: 2.0322 Acc: 25.49% Epoch 5/24 ---------- train Loss: 1.4107 Acc: 48.84% val Loss: 1.3065 Acc: 52.92% Epoch 10/24 ---------- train Loss: 1.0621 Acc: 62.12% val Loss: 1.0013 Acc: 64.64% Epoch 15/24 ---------- train Loss: 0.8828 Acc: 68.70% val Loss: 0.8078 Acc: 72.08% Epoch 20/24 ---------- train Loss: 0.7830 Acc: 72.52% val Loss: 0.7273 Acc: 74.83% Training complete in 11m 44s Best val Acc: 75.50% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_46_1.png Adam .. code:: ipython3 model = MiniVGGNet((3, 16, 32, 1152, 120, 84, D_out)).to(device) optimizer = torch.optim.Adam(model.parameters(), lr=0.001) criterion = nn.NLLLoss() model, losses, accuracies = \ train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25, log_interval=5) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/24 ---------- train Loss: 1.8556 Acc: 30.40% val Loss: 1.5847 Acc: 40.66% Epoch 5/24 ---------- train Loss: 1.2417 Acc: 55.39% val Loss: 1.0908 Acc: 61.45% Epoch 10/24 ---------- train Loss: 1.0203 Acc: 63.66% val Loss: 0.9503 Acc: 66.19% Epoch 15/24 ---------- train Loss: 0.9051 Acc: 67.98% val Loss: 0.8536 Acc: 70.10% Epoch 20/24 ---------- train Loss: 0.8273 Acc: 70.74% val Loss: 0.7942 Acc: 72.55% Training complete in 11m 60s Best val Acc: 74.00% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_48_1.png ResNet ~~~~~~ .. code:: ipython3 model = ResNet(ResidualBlock, [2, 2, 2], num_classes=D_out).to(device) # 195738 parameters optimizer = torch.optim.Adam(model.parameters(), lr=0.001) criterion = nn.NLLLoss() model, losses, accuracies = \ train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25, log_interval=5) _ = plt.plot(losses['train'], '-b', losses['val'], '--r') .. parsed-literal:: Epoch 0/24 ---------- train Loss: 1.4107 Acc: 48.21% val Loss: 1.2645 Acc: 54.80% Epoch 5/24 ---------- train Loss: 0.6440 Acc: 77.60% val Loss: 0.8178 Acc: 72.40% Epoch 10/24 ---------- train Loss: 0.4914 Acc: 82.89% val Loss: 0.6432 Acc: 78.16% Epoch 15/24 ---------- train Loss: 0.4024 Acc: 86.27% val Loss: 0.5026 Acc: 83.43% Epoch 20/24 ---------- train Loss: 0.3496 Acc: 87.86% val Loss: 0.5282 Acc: 82.18% Training complete in 58m 9s Best val Acc: 85.61% .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_50_1.png Segmentation with U-Net ----------------------- Source `Segmentation Models `__: U-Net is a fully convolutional neural network architecture designed for semantic image segmentation. It consists of two main parts: - An encoder (downsampling path) that extracts increasingly abstract features - A decoder (upsampling path) that gradually recovers spatial details The key is the use of skip connections between corresponding encoder and decoder layers. These connections allow the decoder to access fine-grained details from earlier encoder layers, which helps produce more precise segmentation masks. The skip connections work by concatenating feature maps from the encoder directly into the decoder at corresponding resolutions. This helps preserve important spatial information that would otherwise be lost during the encoding process. Example: Image Segmentation with U-Net using PyTorch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Below is an example of how to implement image segmentation using the U-Net architecture with PyTorch on a real dataset. We will use the Oxford-IIIT Pet Dataset for this example. Step1: Load the Dataset We will use the Oxford-IIIT Pet Dataset, which can be downloaded from `here `__. For simplicity, we will assume the dataset is already downloaded and structured as follows: Step 2: Define the U-Net Model Here is the implementation of the U-Net model in PyTorch: .. code:: ipython3 import torch import torch.nn as nn class UNet(nn.Module): def __init__(self, in_channels, out_channels): super(UNet, self).__init__() def conv_block(in_channels, out_channels): return nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1), nn.ReLU(inplace=True) ) def up_conv(in_channels, out_channels): return nn.ConvTranspose2d(in_channels, out_channels, kernel_size=2, stride=2) self.enc1 = conv_block(in_channels, 64) self.enc2 = conv_block(64, 128) self.enc3 = conv_block(128, 256) self.enc4 = conv_block(256, 512) self.pool = nn.MaxPool2d(kernel_size=2, stride=2) self.bottleneck = conv_block(512, 1024) self.upconv4 = up_conv(1024, 512) self.dec4 = conv_block(1024, 512) self.upconv3 = up_conv(512, 256) self.dec3 = conv_block(512, 256) self.upconv2 = up_conv(256, 128) self.dec2 = conv_block(256, 128) self.upconv1 = up_conv(128, 64) self.dec1 = conv_block(128, 64) self.conv_final = nn.Conv2d(64, out_channels, kernel_size=1) def forward(self, x): enc1 = self.enc1(x) enc2 = self.enc2(self.pool(enc1)) enc3 = self.enc3(self.pool(enc2)) enc4 = self.enc4(self.pool(enc3)) bottleneck = self.bottleneck(self.pool(enc4)) dec4 = self.upconv4(bottleneck) dec4 = torch.cat((dec4, enc4), dim=1) dec4 = self.dec4(dec4) dec3 = self.upconv3(dec4) dec3 = torch.cat((dec3, enc3), dim=1) dec3 = self.dec3(dec3) dec2 = self.upconv2(dec3) dec2 = torch.cat((dec2, enc2), dim=1) dec2 = self.dec2(dec2) dec1 = self.upconv1(dec2) dec1 = torch.cat((dec1, enc1), dim=1) dec1 = self.dec1(dec1) return self.conv_final(dec1) Step 3: Load and Preprocess the Dataset We use the torchvision library to load and preprocess the dataset: .. code:: ipython3 from torchvision import transforms from torch.utils.data import DataLoader, Dataset from PIL import Image import os import os.path from pathlib import Path # Directory DIR = os.path.join(Path.home(), "data", "pystatml", "dl_Oxford-IIITPet") # /images: input images # /annotations: corresponding masks class PetDataset(Dataset): def __init__(self, image_dir, mask_dir, transform=None): self.image_dir = image_dir self.mask_dir = mask_dir self.transform = transform self.image_filenames = os.listdir(image_dir) def __len__(self): return len(self.image_filenames) def __getitem__(self, idx): img_path = os.path.join(self.image_dir, self.image_filenames[idx]) mask_path = os.path.join(self.mask_dir, self.image_filenames[idx].replace('.jpg', '.png')) image = Image.open(img_path).convert('RGB') mask = Image.open(mask_path).convert('L') if self.transform: image = self.transform(image) mask = self.transform(mask) return image, mask transform = transforms.Compose([ transforms.Resize((128, 128)), transforms.ToTensor() ]) dataset = PetDataset(os.path.join(DIR, 'images'), os.path.join(DIR, 'annotations'), transform=transform) dataloader = DataLoader(dataset, batch_size=4, shuffle=True) Step 4: Train the U-Net Model Finally, we will train the U-Net model: .. code:: ipython3 import torch.optim as optim model = UNet(in_channels=3, out_channels=1) criterion = nn.BCEWithLogitsLoss() optimizer = optim.Adam(model.parameters(), lr=0.001) def train(model, dataloader, criterion, optimizer, num_epochs=1): model.train() for epoch in range(num_epochs): for images, masks in dataloader: optimizer.zero_grad() outputs = model(images) loss = criterion(outputs, masks) loss.backward() optimizer.step() print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}') # Do not executer (takes 2H) # train(model, dataloader, criterion, optimizer) .. parsed-literal:: Epoch [1/10], Loss: 0.0414 Epoch [2/10], Loss: 0.0438 Epoch [3/10], Loss: 0.0430 Epoch [4/10], Loss: 0.0402 Epoch [5/10], Loss: 0.0449 Epoch [6/10], Loss: 0.0430 Epoch [7/10], Loss: 0.0438 Epoch [8/10], Loss: 0.0433 Epoch [9/10], Loss: 0.0440 Epoch [10/10], Loss: 0.0453 `Save the model and reload the model `__ .. code:: ipython3 model_dirname = os.path.join(DIR, "models") model_filename = os.path.join(model_dirname, "unet.pt") os.makedirs(model_dirname, exist_ok=True) torch.save(model.state_dict(), model_filename) model_ = UNet(in_channels=3, out_channels=1) model_.load_state_dict(torch.load(model_filename, weights_only=True)) _ = model_.eval() Visualize the results .. code:: ipython3 # Visualize the results def visualize_results(model, dataloader, num_images=3): model.eval() images, masks = next(iter(dataloader)) with torch.no_grad(): outputs = model(images) outputs = torch.sigmoid(outputs) outputs = outputs.cpu().numpy() images = images.cpu().numpy() masks = masks.cpu().numpy() fig, axes = plt.subplots(num_images, 3, figsize=(10, 10)) for i in range(num_images): axes[i, 0].imshow(np.transpose(images[i], (1, 2, 0))) axes[i, 0].set_title('Input Image') axes[i, 0].axis('off') axes[i, 1].imshow(masks[i].squeeze(), cmap='gray') axes[i, 1].set_title('Ground Truth Mask') axes[i, 1].axis('off') axes[i, 2].imshow(outputs[i].squeeze(), cmap='gray') axes[i, 2].set_title('Predicted Mask') axes[i, 2].axis('off') plt.tight_layout() plt.show() visualize_results(model, dataloader) .. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_61_0.png U-Net: Training Image Segmentation Models in PyTorch ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ `A simple pytorch implementation of U-net `__ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - `The model `__ - `Train with predefined dataset and dataloader `__ `PyTorch - Lung Segmentation using pretrained U-net `__ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - UNet architecture with pre-trained ResNet34 from `segmentation_models.pytorch `__ library which has many inbuilt segmentation architectures with different backbones. - Identify “Pneumothorax” or a collapsed lung from chest x-rays. - Data Augmentation - Train-val Dataset and DataLoader - User defined Loss