Convolutional neural network
============================
Outline
-------
2. Architecures
3. Train and test functions
4. CNN models
5. MNIST
6. CIFAR-10
Sources:
Deep learning - `cs231n.stanford.edu `__
CNN - `Stanford
cs231n `__
Pytorch - `WWW tutorials `__ - `github
tutorials `__ - `github
examples `__
MNIST and pytorch: - `MNIST
nextjournal.com/gkoehler/pytorch-mnist `__
- `MNIST
github/pytorch/examples `__
- `MNIST
kaggle `__
Architectures
-------------
Sources:
- `cv-tricks.com `__
- [zhenye-na.github.io(]https://zhenye-na.github.io/2018/12/01/cnn-deep-leearning-ai-week2.html)
LeNet
~~~~~
The first Convolutional Networks were developed by Yann LeCun in 1990’s.
.. figure:: ./figures/LeNet_Original_Image.jpg
:alt: LeNet
LeNet
AlexNet
~~~~~~~
(2012, Alex Krizhevsky, Ilya Sutskever and Geoff Hinton)
.. figure:: ./figures/alexnet.png
:alt: AlexNet
AlexNet
.. figure:: ./figures/alexnet_param_tab.png
:alt: AlexNet architecture
AlexNet architecture
- Deeper, bigger,
- Featured Convolutional Layers stacked on top of each other
(previously it was common to only have a single CONV layer always
immediately followed by a POOL layer).
- **ReLu(Rectified Linear Unit)** for the non-linear part, instead of a
Tanh or Sigmoid.
The advantage of the ReLu over sigmoid is that it trains much faster
than the latter because the derivative of sigmoid becomes very small in
the saturating region and therefore the updates to the weights almost
vanish. This is called **vanishing gradient problem**.
- **Dropout**: reduces the over-fitting by using a Dropout layer after
every FC layer. Dropout layer has a probability,(p), associated with
it and is applied at every neuron of the response map separately. It
randomly switches off the activation with the probability p.
.. figure:: ./figures/dropout.png
:alt: Dropout
Dropout
Why does DropOut work?
The idea behind the dropout is similar to the model ensembles. Due to
the dropout layer, different sets of neurons which are switched off,
represent a different architecture and all these different architectures
are trained in parallel with weight given to each subset and the
summation of weights being one. For n neurons attached to DropOut, the
number of subset architectures formed is 2^n. So it amounts to
prediction being averaged over these ensembles of models. This provides
a structured model regularization which helps in avoiding the
over-fitting. Another view of DropOut being helpful is that since
neurons are randomly chosen, they tend to avoid developing
co-adaptations among themselves thereby enabling them to develop
meaningful features, independent of others.
- **Data augmentation** is carried out to reduce over-fitting. This
Data augmentation includes mirroring and cropping the images to
increase the variation in the training data-set.
**GoogLeNet**. (Szegedy et al. from Google 2014) was a Convolutional
Network . Its main contribution was the development of an
- **Inception Module** that dramatically reduced the number of
parameters in the network (4M, compared to AlexNet with 60M).
.. figure:: ./figures/inception_block.png
:alt: Inception Module
:width: 15cm
Inception Module
- There are also several followup versions to the GoogLeNet, most
recently Inception-v4.
**VGGNet**. (Karen Simonyan and Andrew Zisserman 2014)
.. figure:: ./figures/vgg.png
:alt: VGGNet
:width: 15cm
VGGNet
.. figure:: ./figures/vgg_param_tab.png
:alt: VGGNet architecture
:width: 15cm
VGGNet architecture
- 16 CONV/FC layers and, appealingly, features an extremely homogeneous
architecture.
- Only performs 3x3 convolutions and 2x2 pooling from the beginning to
the end. Replace large kernel-sized filters(11 and 5 in the first and
second convolutional layer, respectively) with multiple 3X3
kernel-sized filters one after another.
With a given receptive field(the effective area size of input image on
which output depends), multiple stacked smaller size kernel is better
than the one with a larger size kernel because multiple non-linear
layers increases the depth of the network which enables it to learn more
complex features, and that too at a lower cost. For example, three 3X3
filters on top of each other with stride 1 ha a receptive size of 7, but
the number of parameters involved is 3*(9^2) in comparison to 49^2
parameters of kernels with a size of 7.
- Lot more memory and parameters (140M)
**ResNet**. (Kaiming He et al. 2015)
Resnet block variants
(`Source `__):
.. figure:: ./figures/resnets_modelvariants.png
:alt: ResNet block
:width: 15cm
ResNet block
.. figure:: ./figures/resnet18.png
:alt: ResNet 18
:width: 15cm
ResNet 18
.. figure:: ./figures/resnet_param_tab.png
:alt: ResNet 18 architecture
:width: 15cm
ResNet 18 architecture
- Skip connections
- Batch normalization.
- State of the art CNN models and are the default choice (as of May 10,
2016). In particular, also see more
- Recent developments that tweak the original architecture from Kaiming
He et al. Identity Mappings in Deep Residual Networks (published
March 2016).
`Models in
pytorch `__
Architecures general guidelines
-------------------------------
- ConvNets stack CONV,POOL,FC layers
- Trend towards smaller filters and deeper architectures: stack 3x3,
instead of 5x5
- Trend towards getting rid of POOL/FC layers (just CONV)
- Historically architectures looked like [(CONV-RELU) x N POOL?] x M
(FC-RELU) x K, SOFTMAX where N is usually up to ~5, M is large, 0 <=
K <= 2.
- but recent advances such as ResNet/GoogLeNet have challenged this
paradigm
Train function
--------------
.. code:: ipython3
%matplotlib inline
import os
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torchvision
import torchvision.transforms as transforms
from torchvision import models
#
from pathlib import Path
import matplotlib.pyplot as plt
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
device = 'cpu' # Force CPU
.. code:: ipython3
# %load train_val_model.py
import numpy as np
import torch
import time
import copy
def train_val_model(model, criterion, optimizer, dataloaders, num_epochs=25,
scheduler=None, log_interval=None):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
# Store losses and accuracies accross epochs
losses, accuracies = dict(train=[], val=[]), dict(train=[], val=[])
for epoch in range(num_epochs):
if log_interval is not None and epoch % log_interval == 0:
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'val']:
if phase == 'train':
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
nsamples = 0
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)
nsamples += inputs.shape[0]
# zero the parameter gradients
optimizer.zero_grad()
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)
if scheduler is not None and phase == 'train':
scheduler.step()
#nsamples = dataloaders[phase].dataset.data.shape[0]
epoch_loss = running_loss / nsamples
epoch_acc = running_corrects.double() / nsamples
losses[phase].append(epoch_loss)
accuracies[phase].append(epoch_acc)
if log_interval is not None and epoch % log_interval == 0:
print('{} Loss: {:.4f} Acc: {:.2f}%'.format(
phase, epoch_loss, 100 * epoch_acc))
# deep copy the model
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
if log_interval is not None and epoch % log_interval == 0:
print()
time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:.2f}%'.format(100 * best_acc))
# load best model weights
model.load_state_dict(best_model_wts)
return model, losses, accuracies
CNN models
----------
LeNet-5
~~~~~~~
Here we implement LeNet-5 with relu activation. Sources:
`(1) `__,
`(2) `__.
.. code:: ipython3
import torch.nn as nn
import torch.nn.functional as F
class LeNet5(nn.Module):
"""
layers: (nb channels in input layer,
nb channels in 1rst conv,
nb channels in 2nd conv,
nb neurons for 1rst FC: TO BE TUNED,
nb neurons for 2nd FC,
nb neurons for 3rd FC,
nb neurons output FC TO BE TUNED)
"""
def __init__(self, layers = (1, 6, 16, 1024, 120, 84, 10), debug=False):
super(LeNet5, self).__init__()
self.layers = layers
self.debug = debug
self.conv1 = nn.Conv2d(layers[0], layers[1], 5, padding=2)
self.conv2 = nn.Conv2d(layers[1], layers[2], 5)
self.fc1 = nn.Linear(layers[3], layers[4])
self.fc2 = nn.Linear(layers[4], layers[5])
self.fc3 = nn.Linear(layers[5], layers[6])
def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), 2) # same shape / 2
x = F.max_pool2d(F.relu(self.conv2(x)), 2) # -4 / 2
if self.debug:
print("### DEBUG: Shape of last convnet=", x.shape[1:], ". FC size=", np.prod(x.shape[1:]))
x = x.view(-1, self.layers[3])
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return F.log_softmax(x, dim=1)
VGGNet like: conv-relu blocks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. code:: ipython3
# Defining the network (LeNet-5)
import torch.nn as nn
import torch.nn.functional as F
class MiniVGGNet(torch.nn.Module):
def __init__(self, layers=(1, 16, 32, 1024, 120, 84, 10), debug=False):
super(MiniVGGNet, self).__init__()
self.layers = layers
self.debug = debug
# Conv block 1
self.conv11 = nn.Conv2d(in_channels=layers[0], out_channels=layers[1], kernel_size=3,
stride=1, padding=0, bias=True)
self.conv12 = nn.Conv2d(in_channels=layers[1], out_channels=layers[1], kernel_size=3,
stride=1, padding=0, bias=True)
# Conv block 2
self.conv21 = nn.Conv2d(in_channels=layers[1], out_channels=layers[2], kernel_size=3,
stride=1, padding=0, bias=True)
self.conv22 = nn.Conv2d(in_channels=layers[2], out_channels=layers[2], kernel_size=3,
stride=1, padding=1, bias=True)
# Fully connected layer
self.fc1 = nn.Linear(layers[3], layers[4])
self.fc2 = nn.Linear(layers[4], layers[5])
self.fc3 = nn.Linear(layers[5], layers[6])
def forward(self, x):
x = F.relu(self.conv11(x))
x = F.relu(self.conv12(x))
x = F.max_pool2d(x, 2)
x = F.relu(self.conv21(x))
x = F.relu(self.conv22(x))
x = F.max_pool2d(x, 2)
if self.debug:
print("### DEBUG: Shape of last convnet=", x.shape[1:], ". FC size=", np.prod(x.shape[1:]))
x = x.view(-1, self.layers[3])
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return F.log_softmax(x, dim=1)
ResNet-like Model:
~~~~~~~~~~~~~~~~~~
Stack multiple resnet blocks
.. code:: ipython3
# ---------------------------------------------------------------------------- #
# An implementation of https://arxiv.org/pdf/1512.03385.pdf #
# See section 4.2 for the model architecture on CIFAR-10 #
# Some part of the code was referenced from below #
# https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py #
# ---------------------------------------------------------------------------- #
import torch.nn as nn
# 3x3 convolution
def conv3x3(in_channels, out_channels, stride=1):
return nn.Conv2d(in_channels, out_channels, kernel_size=3,
stride=stride, padding=1, bias=False)
# Residual block
class ResidualBlock(nn.Module):
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(ResidualBlock, self).__init__()
self.conv1 = conv3x3(in_channels, out_channels, stride)
self.bn1 = nn.BatchNorm2d(out_channels)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(out_channels, out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
# ResNet
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=10):
super(ResNet, self).__init__()
self.in_channels = 16
self.conv = conv3x3(3, 16)
self.bn = nn.BatchNorm2d(16)
self.relu = nn.ReLU(inplace=True)
self.layer1 = self.make_layer(block, 16, layers[0])
self.layer2 = self.make_layer(block, 32, layers[1], 2)
self.layer3 = self.make_layer(block, 64, layers[2], 2)
self.avg_pool = nn.AvgPool2d(8)
self.fc = nn.Linear(64, num_classes)
def make_layer(self, block, out_channels, blocks, stride=1):
downsample = None
if (stride != 1) or (self.in_channels != out_channels):
downsample = nn.Sequential(
conv3x3(self.in_channels, out_channels, stride=stride),
nn.BatchNorm2d(out_channels))
layers = []
layers.append(block(self.in_channels, out_channels, stride, downsample))
self.in_channels = out_channels
for i in range(1, blocks):
layers.append(block(out_channels, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
out = self.conv(x)
out = self.bn(out)
out = self.relu(out)
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.avg_pool(out)
out = out.view(out.size(0), -1)
out = self.fc(out)
return F.log_softmax(out, dim=1)
#return out
ResNet9
- `DAWNBench on
cifar10 `__
- `ResNet9: train to 94% CIFAR10 accuracy in 100
seconds `__
MNIST digit classification
--------------------------
.. code:: ipython3
from pathlib import Path
from torchvision import datasets, transforms
import os
WD = os.path.join(Path.home(), "data", "pystatml", "dl_mnist_pytorch")
os.makedirs(WD, exist_ok=True)
os.chdir(WD)
print("Working dir is:", os.getcwd())
os.makedirs("data", exist_ok=True)
os.makedirs("models", exist_ok=True)
def load_mnist(batch_size_train, batch_size_test):
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size_train, shuffle=True)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=batch_size_test, shuffle=True)
return train_loader, test_loader
train_loader, val_loader = load_mnist(64, 1000)
dataloaders = dict(train=train_loader, val=val_loader)
# Info about the dataset
data_shape = dataloaders["train"].dataset.data.shape[1:]
D_in = np.prod(data_shape)
D_out = len(dataloaders["train"].dataset.targets)
print("Datasets shape", {x: dataloaders[x].dataset.data.shape for x in ['train', 'val']})
print("N input features", D_in, "N output", D_out)
.. parsed-literal::
Working dir is: /home/ed203246/data/pystatml/dl_mnist_pytorch
Datasets shape {'train': torch.Size([60000, 28, 28]), 'val': torch.Size([10000, 28, 28])}
N input features 784 N output 60000
LeNet
~~~~~
Dry run in debug mode to get the shape of the last convnet layer.
.. code:: ipython3
model = LeNet5((1, 6, 16, 1, 120, 84, 10), debug=True)
batch_idx, (data_example, target_example) = next(enumerate(train_loader))
print(model)
_ = model(data_example)
.. parsed-literal::
LeNet5(
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=1, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
### DEBUG: Shape of last convnet= torch.Size([16, 5, 5]) . FC size= 400
Set First FC layer to 400
.. code:: ipython3
model = LeNet5((1, 6, 16, 400, 120, 84, 10)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
criterion = nn.NLLLoss()
# Explore the model
for parameter in model.parameters():
print(parameter.shape)
print("Total number of parameters =", np.sum([np.prod(parameter.shape) for parameter in model.parameters()]))
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=5, log_interval=2)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
torch.Size([6, 1, 5, 5])
torch.Size([6])
torch.Size([16, 6, 5, 5])
torch.Size([16])
torch.Size([120, 400])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])
Total number of parameters = 61706
Epoch 0/4
----------
train Loss: 0.7807 Acc: 75.65%
val Loss: 0.1586 Acc: 94.96%
Epoch 2/4
----------
train Loss: 0.0875 Acc: 97.33%
val Loss: 0.0776 Acc: 97.47%
Epoch 4/4
----------
train Loss: 0.0592 Acc: 98.16%
val Loss: 0.0533 Acc: 98.30%
Training complete in 1m 29s
Best val Acc: 98.30%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_17_1.png
MiniVGGNet
~~~~~~~~~~
.. code:: ipython3
model = MiniVGGNet(layers=(1, 16, 32, 1, 120, 84, 10), debug=True)
print(model)
_ = model(data_example)
.. parsed-literal::
MiniVGGNet(
(conv11): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
(conv12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
(conv21): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
(conv22): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc1): Linear(in_features=1, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
### DEBUG: Shape of last convnet= torch.Size([32, 5, 5]) . FC size= 800
Set First FC layer to 800
.. code:: ipython3
model = MiniVGGNet((1, 16, 32, 800, 120, 84, 10)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
criterion = nn.NLLLoss()
# Explore the model
for parameter in model.parameters():
print(parameter.shape)
print("Total number of parameters =", np.sum([np.prod(parameter.shape) for parameter in model.parameters()]))
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=5, log_interval=2)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
torch.Size([16, 1, 3, 3])
torch.Size([16])
torch.Size([16, 16, 3, 3])
torch.Size([16])
torch.Size([32, 16, 3, 3])
torch.Size([32])
torch.Size([32, 32, 3, 3])
torch.Size([32])
torch.Size([120, 800])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])
Total number of parameters = 123502
Epoch 0/4
----------
train Loss: 1.4180 Acc: 48.27%
val Loss: 0.2277 Acc: 92.68%
Epoch 2/4
----------
train Loss: 0.0838 Acc: 97.41%
val Loss: 0.0587 Acc: 98.14%
Epoch 4/4
----------
train Loss: 0.0495 Acc: 98.43%
val Loss: 0.0407 Acc: 98.63%
Training complete in 3m 10s
Best val Acc: 98.63%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_21_1.png
Reduce the size of training dataset
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Reduce the size of the training dataset by considering only ``10``
minibatche for size\ ``16``.
.. code:: ipython3
train_loader, val_loader = load_mnist(16, 1000)
train_size = 10 * 16
# Stratified sub-sampling
targets = train_loader.dataset.targets.numpy()
nclasses = len(set(targets))
indices = np.concatenate([np.random.choice(np.where(targets == lab)[0], int(train_size / nclasses),replace=False)
for lab in set(targets)])
np.random.shuffle(indices)
train_loader = torch.utils.data.DataLoader(train_loader.dataset, batch_size=16,
sampler=torch.utils.data.SubsetRandomSampler(indices))
# Check train subsampling
train_labels = np.concatenate([labels.numpy() for inputs, labels in train_loader])
print("Train size=", len(train_labels), " Train label count=", {lab:np.sum(train_labels == lab) for lab in set(train_labels)})
print("Batch sizes=", [inputs.size(0) for inputs, labels in train_loader])
# Put together train and val
dataloaders = dict(train=train_loader, val=val_loader)
# Info about the dataset
data_shape = dataloaders["train"].dataset.data.shape[1:]
D_in = np.prod(data_shape)
D_out = len(dataloaders["train"].dataset.targets.unique())
print("Datasets shape", {x: dataloaders[x].dataset.data.shape for x in ['train', 'val']})
print("N input features", D_in, "N output", D_out)
.. parsed-literal::
Train size= 160 Train label count= {0: 16, 1: 16, 2: 16, 3: 16, 4: 16, 5: 16, 6: 16, 7: 16, 8: 16, 9: 16}
Batch sizes= [16, 16, 16, 16, 16, 16, 16, 16, 16, 16]
Datasets shape {'train': torch.Size([60000, 28, 28]), 'val': torch.Size([10000, 28, 28])}
N input features 784 N output 10
LeNet5
.. code:: ipython3
model = LeNet5((1, 6, 16, 400, 120, 84, D_out)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=100, log_interval=20)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/99
----------
train Loss: 2.3086 Acc: 11.88%
val Loss: 2.3068 Acc: 14.12%
Epoch 20/99
----------
train Loss: 0.8060 Acc: 76.25%
val Loss: 0.8522 Acc: 72.84%
Epoch 40/99
----------
train Loss: 0.0596 Acc: 99.38%
val Loss: 0.6188 Acc: 82.67%
Epoch 60/99
----------
train Loss: 0.0072 Acc: 100.00%
val Loss: 0.6888 Acc: 83.08%
Epoch 80/99
----------
train Loss: 0.0033 Acc: 100.00%
val Loss: 0.7546 Acc: 82.96%
Training complete in 3m 10s
Best val Acc: 83.46%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_25_1.png
MiniVGGNet
.. code:: ipython3
model = MiniVGGNet((1, 16, 32, 800, 120, 84, 10)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=100, log_interval=20)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/99
----------
train Loss: 2.3040 Acc: 10.00%
val Loss: 2.3025 Acc: 10.32%
Epoch 20/99
----------
train Loss: 2.2963 Acc: 10.00%
val Loss: 2.2969 Acc: 10.35%
Epoch 40/99
----------
train Loss: 2.1158 Acc: 37.50%
val Loss: 2.0764 Acc: 38.06%
Epoch 60/99
----------
train Loss: 0.0875 Acc: 97.50%
val Loss: 0.7315 Acc: 80.50%
Epoch 80/99
----------
train Loss: 0.0023 Acc: 100.00%
val Loss: 1.0397 Acc: 81.69%
Training complete in 5m 38s
Best val Acc: 82.02%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_27_1.png
CIFAR-10 dataset
----------------
`Source Yunjey Choi `__
.. code:: ipython3
from pathlib import Path
WD = os.path.join(Path.home(), "data", "pystatml", "dl_cifar10_pytorch")
os.makedirs(WD, exist_ok=True)
os.chdir(WD)
print("Working dir is:", os.getcwd())
os.makedirs("data", exist_ok=True)
os.makedirs("models", exist_ok=True)
import numpy as np
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Hyper-parameters
num_epochs = 5
learning_rate = 0.001
# Image preprocessing modules
transform = transforms.Compose([
transforms.Pad(4),
transforms.RandomHorizontalFlip(),
transforms.RandomCrop(32),
transforms.ToTensor()])
# CIFAR-10 dataset
train_dataset = torchvision.datasets.CIFAR10(root='data/',
train=True,
transform=transform,
download=True)
val_dataset = torchvision.datasets.CIFAR10(root='data/',
train=False,
transform=transforms.ToTensor())
# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
batch_size=100,
shuffle=True)
val_loader = torch.utils.data.DataLoader(dataset=val_dataset,
batch_size=100,
shuffle=False)
# Put together train and val
dataloaders = dict(train=train_loader, val=val_loader)
# Info about the dataset
data_shape = dataloaders["train"].dataset.data.shape[1:]
D_in = np.prod(data_shape)
D_out = len(set(dataloaders["train"].dataset.targets))
print("Datasets shape:", {x: dataloaders[x].dataset.data.shape for x in ['train', 'val']})
print("N input features:", D_in, "N output:", D_out)
.. parsed-literal::
Working dir is: /home/ed203246/data/pystatml/dl_cifar10_pytorch
Files already downloaded and verified
Datasets shape: {'train': (50000, 32, 32, 3), 'val': (10000, 32, 32, 3)}
N input features: 3072 N output: 10
LeNet
~~~~~
.. code:: ipython3
model = LeNet5((3, 6, 16, 1, 120, 84, D_out), debug=True)
batch_idx, (data_example, target_example) = next(enumerate(train_loader))
print(model)
_ = model(data_example)
.. parsed-literal::
LeNet5(
(conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear(in_features=1, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
### DEBUG: Shape of last convnet= torch.Size([16, 6, 6]) . FC size= 576
Set 576 neurons to the first FC layer
SGD with momentum ``lr=0.001, momentum=0.5``
.. code:: ipython3
model = LeNet5((3, 6, 16, 576, 120, 84, D_out)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.5)
criterion = nn.NLLLoss()
# Explore the model
for parameter in model.parameters():
print(parameter.shape)
print("Total number of parameters =", np.sum([np.prod(parameter.shape) for parameter in model.parameters()]))
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=25, log_interval=5)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
torch.Size([6, 3, 5, 5])
torch.Size([6])
torch.Size([16, 6, 5, 5])
torch.Size([16])
torch.Size([120, 576])
torch.Size([120])
torch.Size([84, 120])
torch.Size([84])
torch.Size([10, 84])
torch.Size([10])
Total number of parameters = 83126
Epoch 0/24
----------
train Loss: 2.3041 Acc: 10.00%
val Loss: 2.3033 Acc: 10.00%
Epoch 5/24
----------
train Loss: 2.2991 Acc: 11.18%
val Loss: 2.2983 Acc: 11.00%
Epoch 10/24
----------
train Loss: 2.2860 Acc: 10.36%
val Loss: 2.2823 Acc: 10.60%
Epoch 15/24
----------
train Loss: 2.1759 Acc: 18.83%
val Loss: 2.1351 Acc: 20.74%
Epoch 20/24
----------
train Loss: 2.0159 Acc: 25.35%
val Loss: 1.9878 Acc: 26.90%
Training complete in 7m 26s
Best val Acc: 28.98%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_34_1.png
Increase learning rate and momentum ``lr=0.01, momentum=0.9``
.. code:: ipython3
model = LeNet5((3, 6, 16, 576, 120, 84, D_out)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=25, log_interval=5)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/24
----------
train Loss: 2.0963 Acc: 21.65%
val Loss: 1.8211 Acc: 33.49%
Epoch 5/24
----------
train Loss: 1.3500 Acc: 51.34%
val Loss: 1.2278 Acc: 56.40%
Epoch 10/24
----------
train Loss: 1.1569 Acc: 58.79%
val Loss: 1.0933 Acc: 60.95%
Epoch 15/24
----------
train Loss: 1.0724 Acc: 62.12%
val Loss: 0.9863 Acc: 65.34%
Epoch 20/24
----------
train Loss: 1.0131 Acc: 64.41%
val Loss: 0.9720 Acc: 66.14%
Training complete in 7m 17s
Best val Acc: 67.87%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_36_1.png
Adaptative learning rate: Adam
.. code:: ipython3
model = LeNet5((3, 6, 16, 576, 120, 84, D_out)).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=25, log_interval=5)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/24
----------
train Loss: 1.8411 Acc: 30.21%
val Loss: 1.5768 Acc: 41.22%
Epoch 5/24
----------
train Loss: 1.3185 Acc: 52.17%
val Loss: 1.2181 Acc: 55.71%
Epoch 10/24
----------
train Loss: 1.1724 Acc: 57.89%
val Loss: 1.1244 Acc: 59.17%
Epoch 15/24
----------
train Loss: 1.0987 Acc: 60.98%
val Loss: 1.0153 Acc: 63.82%
Epoch 20/24
----------
train Loss: 1.0355 Acc: 63.01%
val Loss: 0.9901 Acc: 64.90%
Training complete in 7m 30s
Best val Acc: 66.88%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_38_1.png
MiniVGGNet
~~~~~~~~~~
.. code:: ipython3
model = MiniVGGNet(layers=(3, 16, 32, 1, 120, 84, D_out), debug=True)
print(model)
_ = model(data_example)
.. parsed-literal::
MiniVGGNet(
(conv11): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1))
(conv12): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
(conv21): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1))
(conv22): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(fc1): Linear(in_features=1, out_features=120, bias=True)
(fc2): Linear(in_features=120, out_features=84, bias=True)
(fc3): Linear(in_features=84, out_features=10, bias=True)
)
### DEBUG: Shape of last convnet= torch.Size([32, 6, 6]) . FC size= 1152
Set 1152 neurons to the first FC layer
SGD with large momentum and learning rate
.. code:: ipython3
model = MiniVGGNet((3, 16, 32, 1152, 120, 84, D_out)).to(device)
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=25, log_interval=5)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/24
----------
train Loss: 2.3027 Acc: 10.14%
val Loss: 2.3010 Acc: 10.00%
Epoch 5/24
----------
train Loss: 1.4829 Acc: 46.08%
val Loss: 1.3860 Acc: 50.39%
Epoch 10/24
----------
train Loss: 1.0899 Acc: 61.43%
val Loss: 1.0121 Acc: 64.59%
Epoch 15/24
----------
train Loss: 0.8825 Acc: 69.02%
val Loss: 0.7788 Acc: 72.73%
Epoch 20/24
----------
train Loss: 0.7805 Acc: 72.73%
val Loss: 0.7222 Acc: 74.72%
Training complete in 15m 19s
Best val Acc: 76.62%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_43_1.png
Adam
.. code:: ipython3
model = MiniVGGNet((3, 16, 32, 1152, 120, 84, D_out)).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=25, log_interval=5)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/24
----------
train Loss: 1.8591 Acc: 30.74%
val Loss: 1.5424 Acc: 43.46%
Epoch 5/24
----------
train Loss: 1.1562 Acc: 58.46%
val Loss: 1.0811 Acc: 61.87%
Epoch 10/24
----------
train Loss: 0.9630 Acc: 65.69%
val Loss: 0.8669 Acc: 68.94%
Epoch 15/24
----------
train Loss: 0.8634 Acc: 69.38%
val Loss: 0.7933 Acc: 72.33%
Epoch 20/24
----------
train Loss: 0.8033 Acc: 71.75%
val Loss: 0.7737 Acc: 73.57%
Training complete in 15m 37s
Best val Acc: 74.86%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_45_1.png
ResNet
~~~~~~
.. code:: ipython3
model = ResNet(ResidualBlock, [2, 2, 2], num_classes=D_out).to(device) # 195738 parameters
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
criterion = nn.NLLLoss()
model, losses, accuracies = train_val_model(model, criterion, optimizer, dataloaders,
num_epochs=25, log_interval=5)
_ = plt.plot(losses['train'], '-b', losses['val'], '--r')
.. parsed-literal::
Epoch 0/24
----------
train Loss: 1.4169 Acc: 48.11%
val Loss: 1.5213 Acc: 48.08%
Epoch 5/24
----------
train Loss: 0.6279 Acc: 78.09%
val Loss: 0.6652 Acc: 77.49%
Epoch 10/24
----------
train Loss: 0.4772 Acc: 83.57%
val Loss: 0.5314 Acc: 82.09%
Epoch 15/24
----------
train Loss: 0.4010 Acc: 86.09%
val Loss: 0.6457 Acc: 79.03%
Epoch 20/24
----------
train Loss: 0.3435 Acc: 88.07%
val Loss: 0.4887 Acc: 84.34%
Training complete in 103m 30s
Best val Acc: 85.66%
.. image:: dl_cnn_cifar10_pytorch_files/dl_cnn_cifar10_pytorch_47_1.png