
Pytorch Intro

My notes from the tutorials by sentdex, who is one of the best tech educators I learned from, and from learnpytorch.io.

Don't take this as a reputable source of knowledge - this is my first exposure to PyTorch and all of the following information is relayed through me from tutorials by people who knows this.

And the part about neural networks from Harvard CS50’s Artificial Intelligence with Python – Full University Course is a good general but detailed intro.

1. sentdex

From the series Pytorch - Deep learning w/ Python.

Text based tutorials are available at https://pythonprogramming.net/introduction-deep-learning-neural-network-pytorch/. A lot of the code in this note comes directly from there and all of the code in this note is based on the video tutorial (except the input data ascii visualisation).

You can see and download the code here: mnist_example.py


import torch

x = torch.Tensor([5, 3])
y = torch.Tensor([2, 1])

print(x * y)
tensor([10., 3.])

You can think of a tensor as an array.

Initializing a tensor

with zeros
x = torch.zeros([2, 5])

x is a 2 by 5 array of zeros.

torch.Size[2, 5]
y = torch.rand([2, 5])
tensor([[0.5230, 0.3101, 0.0583, 0.4859, 0.9638],
        [0.8641, 0.5959, 0.1374, 0.0651, 0.1771]])

Reshaping a tensor

y.view([1, 10])
tensor([[0.5230, 0.3101, 0.0583, 0.4859, 0.9638, 0.8641, 0.5959, 0.1374, 0.0651,


Loading a sample dataset

import torch
import torchvision
from torchvision import transforms, datasets

train = datasets.MNIST('', train=True, download=True,

test = datasets.MNIST('', train=False, download=True,

Splitting and shuffling the datasets

trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)

the batch size matters, should be between 8 and 64. The bigger it is, the faster it will train, but it requires more memory.

Why shuffle?

On this example, mnist, if all output labels are ordered (first all nines, then all eights, etc...) then the neural network will first optimize that everything is a 9, then that everything is an 8, etc., until everything is a 0. So by shuffling, we're forcing the network to do the hard work.

Example of printing a row from trainset
for data in trainset:


[tensor([[[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],

        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],

        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],


        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],

        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]],

        [[[0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.],
          [0., 0., 0.,  ..., 0., 0., 0.]]]]), tensor([8, 5, 4, 2, 4, 0, 6, 9, 5, 2])]

Here, data is one batch of the training data. So it's 10 rows of the training set - because we set batch=10. It's a tuple where the first element is a tensor of images (inputs) and the second is a tensor of digits drawn on that image (outputs). so

data[0][0] # is the first image 
data[1][0] # is the digit drawn on that image

printing data[0][0] as ascii art

for row in data[0][0][0]:
    for val in row:
        if val > 0.4:
            print('##', end='')
            print('  ', end='')
        ##########    ####
      ########        ##          ########
    ########                  ##########
      ############        ##########
          ##########  ##########
          ######  ######
        ######    ####
        ####    ######
        ####  ######

and the output for this is

>>> data[1][0]

so this drawing in data[0][0] is the digit 8 because data[1][0] is tensor(8).

Balancing the dataset

If the dataset is unbalanced, for example being mostly 3, then the network will be much more likely to predict 3. There are many ways to prevent that, like balancing the outputs with weights but that's not working great (reference in video).

code source

total = 0
counter_dict = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}

for data in trainset:
    Xs, ys = data
    for y in ys:
        counter_dict[int(y)] += 1
        total += 1


for i in counter_dict:
    print(f"{i}: {counter_dict[i]/total*100.0:>3.2f}%")


0: 9.87%
1: 11.24%
2: 9.93%
3: 10.22%
4: 9.74%
5: 9.04%
6: 9.86%
7: 10.44%
8: 9.75%
9: 9.92%

it's not perfectly balanced but it will do okay with this slight variation.

Creating a Neural Network

Defining the shape

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        # define the first layer. 
        # 28*28 input neurons because we will flatten each 28 by 28 image
        # 64 output neurons because why not
        self.fc1 = nn.Linear(28*28, 64)

        # next layers
        # in 64 - out 64
        self.fc2 = nn.Linear(64, 64)
        self.fc3 = nn.Linear(64, 64)

        # output layer
        # 64 inputs from the previous layer
        # 10 outputs - 1 for each output label - digits from 0 to 9
        self.fc4 = nn.Linear(64, 10)
net = Net()
  (fc1): Linear(in_features=784, out_features=64, bias=True)
  (fc2): Linear(in_features=64, out_features=64, bias=True)
  (fc3): Linear(in_features=64, out_features=64, bias=True)
  (fc4): Linear(in_features=64, out_features=10, bias=True)

Defining feed forward

class Net(nn.Module):
    def __init__(self):
        # ...

        # defines how feed-forward works
    def forward(self, x):
        # apply the relu ( _/ ) activation function to each output of each layer
        # feed the output of each layer to the input of the next
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))

        # the output layer
        x = self.fc4(x)

        # normalize the 10 output values so they sum to 1
        x = F.log_softmax(x, dim=1)
        # so that we will aim for output of:
        # [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
        # if the image was a drawing of 2

        return x

Creating and running the neural network on a random input:

net = Net()
X = torch.rand((28,28))
X = X.view(-1, 28*28)
output = net(X)

the output is

tensor([[-2.3538, -2.3713, -2.2464, -2.4122, -2.2209, -2.2132, -2.2131, -2.2813,
         -2.3209, -2.4227]], grad_fn=<LogSoftmaxBackward0>)

Training the network

We need to create an optimizer - this is what does the training. In this case we set the learning rate to 0.001. (I need to look up what an optimizer does)

# [all previous imports here]
import torch.optim as optim

class Net(nn.Module):
    # [...]

# [loading and preparing the training data]

optimizer = optim.Adam(net.parameters(), lr=0.001)

# an epoch is one passing through all the training data

for epoch in range(EPOCHS):
    for data in trainset:
        # data is a batch (of 10 in our case) 
        # of featuresets (inputs) and labels (expected outputs)
        X, y = data

        # the gradient needs to be zeroed after each batch ???
        # the optimizer uses those gradients to optimize weights

        # feed forward
        output = net(X.view(-1, 28*28))

        # we're aiming for the output to be a 
        # one hot vector (one 1 and the rest 0s)
        loss = F.nll_loss(output, y)

        # back propagate the loss

        # adjust the weights

    print("Batch", batch, "done")

Running the network

After the training is done, we can run it on some test data.

for data in testset:
     X, y = data

(remember that X and y are a batch - there are 10 examples in them)

Printing the first input in the batch.

for row in X[0][0]:
    for val in row:
        if val > 0.4:
            print('##', end='')
            print('  ', end='')
              ##  ##########################

it's clearly a 7.

Let's see what the neural network thinks.

print(torch.argmax(net(X[0].view(-1, 28*28))[0]))



It worked! 🎉

Checking accuracy

Accuracy is a metric saying how many predictions the network got right out of a set of examples.

code source

correct = 0
total = 0

with torch.no_grad():
    for data in testset:
        X, y = data
        output = net(X.view(-1,784))
        for idx, i in enumerate(output):
            #print(torch.argmax(i), y[idx])
            if torch.argmax(i) == y[idx]:
                correct += 1
            total += 1

print("Accuracy: ", round(correct/total, 3))

This code printed

Accuracy:  0.967


Try the code yourself: mnist_example.py
It trains the network, prints one test example, and shows the network's prediction on that example.

2. learnpytorch.io

Source: 00_pytorch_fundamentals


Tensors have a data type (float, float32, float16, int8, ...) and are associated with device (cpu, gpu). They can be initialized with zeros, ones, or randomly.

Tensor operations

Seeding randomness

To get reproducible results despite using randomness to initialize neural networks, torch's random number generator can be seeded like this:


Running on GPU

I'm skipping that until I have a GPU to run it on.
