Pytorch Intro
My notes from the tutorials by sentdex, who is one of the best tech educators I learned from, and from learnpytorch.io.
Don't take this as a reputable source of knowledge - this is my first exposure to PyTorch and all of the following information is relayed through me from tutorials by people who knows this.
And the part about neural networks from Harvard CS50’s Artificial Intelligence with Python – Full University Course is a good general but detailed intro.
1. sentdex
From the series Pytorch - Deep learning w/ Python.
Text based tutorials are available at https://pythonprogramming.net/introduction-deep-learning-neural-network-pytorch/. A lot of the code in this note comes directly from there and all of the code in this note is based on the video tutorial (except the input data ascii visualisation).
You can see and download the code here: mnist_example.py
Intro
import torch
x = torch.Tensor([5, 3])
y = torch.Tensor([2, 1])
print(x * y)
tensor([10., 3.])
You can think of a tensor
as an array.
Initializing a tensor
with zeros
x = torch.zeros([2, 5])
x
is a 2 by 5 array of zeros.
print(x.shape)
torch.Size[2, 5]
randomly
y = torch.rand([2, 5])
print(y)
tensor([[0.5230, 0.3101, 0.0583, 0.4859, 0.9638],
[0.8641, 0.5959, 0.1374, 0.0651, 0.1771]])
Reshaping a tensor
y.view([1, 10])
tensor([[0.5230, 0.3101, 0.0583, 0.4859, 0.9638, 0.8641, 0.5959, 0.1374, 0.0651,
0.1771]])
Datasets
Loading a sample dataset
import torch
import torchvision
from torchvision import transforms, datasets
train = datasets.MNIST('', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor()
]))
test = datasets.MNIST('', train=False, download=True,
transform=transforms.Compose([
transforms.ToTensor()
]))
Splitting and shuffling the datasets
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)
the batch size matters, should be between 8 and 64. The bigger it is, the faster it will train, but it requires more memory.
Why shuffle?
On this example, mnist, if all output labels are ordered (first all nines, then all eights, etc...) then the neural network will first optimize that everything is a 9, then that everything is an 8, etc., until everything is a 0. So by shuffling, we're forcing the network to do the hard work.
Example of printing a row from trainset
for data in trainset:
print(data)
break
produces
[tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
...,
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]]]), tensor([8, 5, 4, 2, 4, 0, 6, 9, 5, 2])]
Here, data
is one batch of the training data.
So it's 10 rows of the training set - because we set batch=10
.
It's a tuple where the first element is a tensor of images (inputs) and the second is a tensor of digits drawn on
that image (outputs).
so
data[0][0] # is the first image
data[1][0] # is the digit drawn on that image
printing data[0][0]
as ascii art
for row in data[0][0][0]:
for val in row:
if val > 0.4:
print('##', end='')
else:
print(' ', end='')
print()
######
############
################
########## ####
######## ## ########
######## ##########
############ ##########
########## ##########
############
##########
########
##########
############
###### ######
###### ####
#### ######
#### ######
##########
##########
######
and the output for this is
>>> data[1][0]
tensor(8)
so this drawing in data[0][0]
is the digit 8
because data[1][0]
is
tensor(8)
.
Balancing the dataset
If the dataset is unbalanced, for example being mostly 3
, then the network will be much more likely
to predict 3
. There are many ways to prevent that, like balancing the outputs with weights but that's
not working great (reference in video).
total = 0
counter_dict = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}
for data in trainset:
Xs, ys = data
for y in ys:
counter_dict[int(y)] += 1
total += 1
print(counter_dict)
for i in counter_dict:
print(f"{i}: {counter_dict[i]/total*100.0:>3.2f}%")
produces
0: 9.87%
1: 11.24%
2: 9.93%
3: 10.22%
4: 9.74%
5: 9.04%
6: 9.86%
7: 10.44%
8: 9.75%
9: 9.92%
it's not perfectly balanced but it will do okay with this slight variation.
Creating a Neural Network
Defining the shape
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
# define the first layer.
# 28*28 input neurons because we will flatten each 28 by 28 image
# 64 output neurons because why not
self.fc1 = nn.Linear(28*28, 64)
# next layers
# in 64 - out 64
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 64)
# output layer
# 64 inputs from the previous layer
# 10 outputs - 1 for each output label - digits from 0 to 9
self.fc4 = nn.Linear(64, 10)
net = Net()
print(net)
Net(
(fc1): Linear(in_features=784, out_features=64, bias=True)
(fc2): Linear(in_features=64, out_features=64, bias=True)
(fc3): Linear(in_features=64, out_features=64, bias=True)
(fc4): Linear(in_features=64, out_features=10, bias=True)
)
Defining feed forward
class Net(nn.Module):
def __init__(self):
# ...
# defines how feed-forward works
def forward(self, x):
# apply the relu ( _/ ) activation function to each output of each layer
# feed the output of each layer to the input of the next
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
# the output layer
x = self.fc4(x)
# normalize the 10 output values so they sum to 1
x = F.log_softmax(x, dim=1)
# so that we will aim for output of:
# [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
# if the image was a drawing of 2
return x
Creating and running the neural network on a random input:
net = Net()
X = torch.rand((28,28))
X = X.view(-1, 28*28)
output = net(X)
the output
is
tensor([[-2.3538, -2.3713, -2.2464, -2.4122, -2.2209, -2.2132, -2.2131, -2.2813,
-2.3209, -2.4227]], grad_fn=<LogSoftmaxBackward0>)
Training the network
We need to create an optimizer - this is what does the training. In this case we set the learning rate to
0.001
. (I need to look up what an optimizer does)
# [all previous imports here]
import torch.optim as optim
class Net(nn.Module):
# [...]
# [loading and preparing the training data]
optimizer = optim.Adam(net.parameters(), lr=0.001)
# an epoch is one passing through all the training data
EPOCHS = 3
for epoch in range(EPOCHS):
for data in trainset:
# data is a batch (of 10 in our case)
# of featuresets (inputs) and labels (expected outputs)
X, y = data
# the gradient needs to be zeroed after each batch ???
net.zero_grad()
# the optimizer uses those gradients to optimize weights
# feed forward
output = net(X.view(-1, 28*28))
# we're aiming for the output to be a
# one hot vector (one 1 and the rest 0s)
loss = F.nll_loss(output, y)
# back propagate the loss
loss.backward()
# adjust the weights
optimizer.step()
print("Batch", batch, "done")
Running the network
After the training is done, we can run it on some test data.
for data in testset:
X, y = data
break
(remember that X
and y
are a batch - there are 10 examples in them)
Printing the first input in the batch.
for row in X[0][0]:
for val in row:
if val > 0.4:
print('##', end='')
else:
print(' ', end='')
print()
######
##############################
## ##########################
######
####
####
######
####
######
####
######
####
######
######
####
######
######
######
########
######
it's clearly a 7.
Let's see what the neural network thinks.
print(torch.argmax(net(X[0].view(-1, 28*28))[0]))
produces
tensor(7)
It worked! 🎉
Checking accuracy
Accuracy is a metric saying how many predictions the network got right out of a set of examples.
correct = 0
total = 0
with torch.no_grad():
for data in testset:
X, y = data
output = net(X.view(-1,784))
#print(output)
for idx, i in enumerate(output):
#print(torch.argmax(i), y[idx])
if torch.argmax(i) == y[idx]:
correct += 1
total += 1
print("Accuracy: ", round(correct/total, 3))
This code printed
Accuracy: 0.967
Code
Try the code yourself: mnist_example.py
It trains the network, prints one test example, and shows the network's prediction on that example.
2. learnpytorch.io
Source: 00_pytorch_fundamentals
Tensors
Tensors have a data type (float
, float32
, float16
, int8
, ...)
and are associated with device (cpu
, gpu
).
They can be initialized with zeros, ones, or randomly.
Tensor operations
- Addition
tensor + 10
,tensor + tensor
- Subtraction
tensor - 10
,tensor - tensor
- Multiplication (element-wise)
tensor * 10
,tensor * tensor
- Division
- Matrix multiplication
torch.matmul(tensor1, tensor2)
- Transposition
tensor.T
,torch.transpose(tensor, dim1, dim2)
- Aggregation
- Minimum
tensor.min()
- positional
tensor.argmin()
- positional
- Maximum
tensor.max()
- positional
tensor.argmin()
- positional
- Mean
tensor.mean()
(only works with float datatype) - Sum
tensor.sum()
- Minimum
- Flatten to 1 dimension (vector)
torch.squeeze(tensor)
- Reshaping
torch.arange(1, 10).reshape(1, 3, 3)
creates a 3x3 matrix with values from 1 to 9
Seeding randomness
To get reproducible results despite using randomness to initialize neural networks, torch's random number generator can be seeded like this:
torch.manual_seed(seed=RANDOM_SEED)
Running on GPU
I'm skipping that until I have a GPU to run it on.