# Pytorch Intro

My notes from the tutorials by sentdex, who is one of the best tech educators I learned from, and from learnpytorch.io.

Don't take this as a reputable source of knowledge - this is my first exposure to PyTorch and all of the following information is relayed through me from tutorials by people who knows this.

And the part about neural networks from Harvard CS50’s Artificial Intelligence with Python – Full University Course is a good general but detailed intro.

## 1. sentdex

From the series Pytorch - Deep learning w/ Python.

Text based tutorials are available at https://pythonprogramming.net/introduction-deep-learning-neural-network-pytorch/. A lot of the code in this note comes directly from there and all of the code in this note is based on the video tutorial (except the input data ascii visualisation).

You can see and download the code here: mnist_example.py

### Intro

```
import torch
x = torch.Tensor([5, 3])
y = torch.Tensor([2, 1])
print(x * y)
```

```
tensor([10., 3.])
```

You can think of a `tensor`

as an array.

#### Initializing a tensor

##### with zeros

```
x = torch.zeros([2, 5])
```

`x`

is a 2 by 5 array of zeros.

```
print(x.shape)
```

```
torch.Size[2, 5]
```

##### randomly

```
y = torch.rand([2, 5])
print(y)
```

```
tensor([[0.5230, 0.3101, 0.0583, 0.4859, 0.9638],
[0.8641, 0.5959, 0.1374, 0.0651, 0.1771]])
```

#### Reshaping a tensor

```
y.view([1, 10])
```

```
tensor([[0.5230, 0.3101, 0.0583, 0.4859, 0.9638, 0.8641, 0.5959, 0.1374, 0.0651,
0.1771]])
```

### Datasets

#### Loading a sample dataset

```
import torch
import torchvision
from torchvision import transforms, datasets
train = datasets.MNIST('', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor()
]))
test = datasets.MNIST('', train=False, download=True,
transform=transforms.Compose([
transforms.ToTensor()
]))
```

#### Splitting and shuffling the datasets

```
trainset = torch.utils.data.DataLoader(train, batch_size=10, shuffle=True)
testset = torch.utils.data.DataLoader(test, batch_size=10, shuffle=False)
```

the batch size matters, should be between 8 and 64. The bigger it is, the faster it will train, but it requires more memory.

##### Why shuffle?

On this example, mnist, if all output labels are ordered (first all nines, then all eights, etc...) then the neural network will first optimize that everything is a 9, then that everything is an 8, etc., until everything is a 0. So by shuffling, we're forcing the network to do the hard work.

##### Example of printing a row from `trainset`

```
for data in trainset:
print(data)
break
```

produces

```
[tensor([[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
...,
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]],
[[[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]]]]), tensor([8, 5, 4, 2, 4, 0, 6, 9, 5, 2])]
```

Here, `data`

is one batch of the training data.
So it's 10 rows of the training set - because we set `batch=10`

.
It's a tuple where the first element is a tensor of images (inputs) and the second is a tensor of digits drawn on
that image (outputs).
so

```
data[0][0] # is the first image
data[1][0] # is the digit drawn on that image
```

printing `data[0][0]`

as ascii art

```
for row in data[0][0][0]:
for val in row:
if val > 0.4:
print('##', end='')
else:
print(' ', end='')
print()
```

```
######
############
################
########## ####
######## ## ########
######## ##########
############ ##########
########## ##########
############
##########
########
##########
############
###### ######
###### ####
#### ######
#### ######
##########
##########
######
```

and the output for this is

```
>>> data[1][0]
tensor(8)
```

so this drawing in `data[0][0]`

is the digit `8`

because `data[1][0]`

is
`tensor(8)`

.

#### Balancing the dataset

If the dataset is unbalanced, for example being mostly `3`

, then the network will be much more likely
to predict `3`

. There are many ways to prevent that, like balancing the outputs with weights but that's
not working great (reference in video).

```
total = 0
counter_dict = {0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}
for data in trainset:
Xs, ys = data
for y in ys:
counter_dict[int(y)] += 1
total += 1
print(counter_dict)
for i in counter_dict:
print(f"{i}: {counter_dict[i]/total*100.0:>3.2f}%")
```

produces

```
0: 9.87%
1: 11.24%
2: 9.93%
3: 10.22%
4: 9.74%
5: 9.04%
6: 9.86%
7: 10.44%
8: 9.75%
9: 9.92%
```

it's not perfectly balanced but it will do okay with this slight variation.

### Creating a Neural Network

#### Defining the shape

```
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super().__init__()
# define the first layer.
# 28*28 input neurons because we will flatten each 28 by 28 image
# 64 output neurons because why not
self.fc1 = nn.Linear(28*28, 64)
# next layers
# in 64 - out 64
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 64)
# output layer
# 64 inputs from the previous layer
# 10 outputs - 1 for each output label - digits from 0 to 9
self.fc4 = nn.Linear(64, 10)
```

```
net = Net()
print(net)
```

```
Net(
(fc1): Linear(in_features=784, out_features=64, bias=True)
(fc2): Linear(in_features=64, out_features=64, bias=True)
(fc3): Linear(in_features=64, out_features=64, bias=True)
(fc4): Linear(in_features=64, out_features=10, bias=True)
)
```

#### Defining feed forward

```
class Net(nn.Module):
def __init__(self):
# ...
# defines how feed-forward works
def forward(self, x):
# apply the relu ( _/ ) activation function to each output of each layer
# feed the output of each layer to the input of the next
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
# the output layer
x = self.fc4(x)
# normalize the 10 output values so they sum to 1
x = F.log_softmax(x, dim=1)
# so that we will aim for output of:
# [0, 0, 1, 0, 0, 0, 0, 0, 0, 0]
# if the image was a drawing of 2
return x
```

Creating and running the neural network on a random input:

```
net = Net()
X = torch.rand((28,28))
X = X.view(-1, 28*28)
output = net(X)
```

the `output`

is

```
tensor([[-2.3538, -2.3713, -2.2464, -2.4122, -2.2209, -2.2132, -2.2131, -2.2813,
-2.3209, -2.4227]], grad_fn=<LogSoftmaxBackward0>)
```

#### Training the network

We need to create an optimizer - this is what does the training. In this case we set the learning rate to
`0.001`

. (I need to look up what an optimizer does)

```
# [all previous imports here]
import torch.optim as optim
class Net(nn.Module):
# [...]
# [loading and preparing the training data]
optimizer = optim.Adam(net.parameters(), lr=0.001)
# an epoch is one passing through all the training data
EPOCHS = 3
for epoch in range(EPOCHS):
for data in trainset:
# data is a batch (of 10 in our case)
# of featuresets (inputs) and labels (expected outputs)
X, y = data
# the gradient needs to be zeroed after each batch ???
net.zero_grad()
# the optimizer uses those gradients to optimize weights
# feed forward
output = net(X.view(-1, 28*28))
# we're aiming for the output to be a
# one hot vector (one 1 and the rest 0s)
loss = F.nll_loss(output, y)
# back propagate the loss
loss.backward()
# adjust the weights
optimizer.step()
print("Batch", batch, "done")
```

#### Running the network

After the training is done, we can run it on some test data.

```
for data in testset:
X, y = data
break
```

(remember that `X`

and `y`

are a batch - there are 10 examples in them)

Printing the first input in the batch.

```
for row in X[0][0]:
for val in row:
if val > 0.4:
print('##', end='')
else:
print(' ', end='')
print()
```

```
######
##############################
## ##########################
######
####
####
######
####
######
####
######
####
######
######
####
######
######
######
########
######
```

it's clearly a 7.

Let's see what the neural network thinks.

```
print(torch.argmax(net(X[0].view(-1, 28*28))[0]))
```

produces

```
tensor(7)
```

It worked! 🎉

##### Checking accuracy

Accuracy is a metric saying how many predictions the network got right out of a set of examples.

```
correct = 0
total = 0
with torch.no_grad():
for data in testset:
X, y = data
output = net(X.view(-1,784))
#print(output)
for idx, i in enumerate(output):
#print(torch.argmax(i), y[idx])
if torch.argmax(i) == y[idx]:
correct += 1
total += 1
print("Accuracy: ", round(correct/total, 3))
```

This code printed

```
Accuracy: 0.967
```

### Code

Try the code yourself: mnist_example.py

It trains the network, prints one test example, and shows the network's prediction on that example.

## 2. learnpytorch.io

Source: 00_pytorch_fundamentals

### Tensors

Tensors have a data type (`float`

, `float32`

, `float16`

, `int8`

, ...)
and are associated with device (`cpu`

, `gpu`

).
They can be initialized with zeros, ones, or randomly.

#### Tensor operations

- Addition
`tensor + 10`

,`tensor + tensor`

- Subtraction
`tensor - 10`

,`tensor - tensor`

- Multiplication (element-wise)
`tensor * 10`

,`tensor * tensor`

- Division
- Matrix multiplication
`torch.matmul(tensor1, tensor2)`

- Transposition
`tensor.T`

,`torch.transpose(tensor, dim1, dim2)`

- Aggregation
- Minimum
`tensor.min()`

- positional
`tensor.argmin()`

- positional
- Maximum
`tensor.max()`

- positional
`tensor.argmin()`

- positional
- Mean
`tensor.mean()`

(only works with float datatype) - Sum
`tensor.sum()`

- Minimum
- Flatten to 1 dimension (vector)
`torch.squeeze(tensor)`

- Reshaping
`torch.arange(1, 10).reshape(1, 3, 3)`

creates a 3x3 matrix with values from 1 to 9

### Seeding randomness

To get reproducible results despite using randomness to initialize neural networks, torch's random number generator can be seeded like this:

```
torch.manual_seed(seed=RANDOM_SEED)
```

### Running on GPU

I'm skipping that until I have a GPU to run it on.