# PyTorch

Published:

This lesson covers PyTorch Tutorial, https://pytorch.org/tutorials/beginner/basics/intro.html

• When training neural networks, the most frequently used algorithm is back propagation
• parameters (model weights) are adjusted according to the gradient of the loss function with respect to the given parameter
• To compute those gradients, PyTorch has a built-in differentiation engine called torch.autograd
• It supports automatic computation of gradient for any computational graph
topic = "pytorch"
lesson = 6

from n import *
home, models_path = get_project_dir("FashionMNIST")
print(home)

/home/naneja/datasets/n/FashionMNIST

import torch
import random
import numpy as np

seed = 0

os.environ['PYTHONHASHSEED'] = str(seed)

# Torch RNG
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

torch.use_deterministic_algorithms(True)

# Python RNG
np.random.seed(seed)
random.seed(seed)

x = torch.ones(5)  # input tensor
print_("Input Tensor", x)

y = torch.zeros(3)  # expected output
print_("Expected Output ", y)

torch.manual_seed(seed)
print_("Initial w ", w)

print_("Bias ", b)

z = torch.matmul(x, w)+b
print_("logits ", z)

loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)
print_(f"loss={loss:.4f}")


Input Tensor

tensor([1., 1., 1., 1., 1.])


Expected Output

tensor([0., 0., 0.])


Initial w

tensor([[ 1.5410, -0.2934, -2.1788],
[ 0.5684, -1.0845, -1.3986],
[ 0.4033,  0.8380, -0.7193],
[-0.4033, -0.5966,  0.1820],


Bias

tensor([ 0.1227, -0.5663,  0.3731], requires_grad=True)


logits

tensor([ 1.3755, -0.6023, -4.8127], grad_fn=<AddBackward0>)


loss = 0.6819

## Tensors, Functions and Computational graph • w and b are parameters, which we need to optimize

• compute the gradients of loss function with respect to those variables

• set the requires_grad property of those tensors
• set the value of requires_grad when creating a tensor or later
• A function that we apply to tensors to construct computational graph is in fact an object of class Function
• This object knows how to compute the function in the forward direction, and also how to compute its derivative during the backward propagation step
• A reference to the backward propagation function is stored in grad_fn property of a tensor
print_(f"Gradient function for z = {z.grad_fn}")


Gradient function for loss = <BinaryCrossEntropyWithLogitsBackward0 object at 0x7f66869c5c60>

• To optimize weights of parameters in the neural network, we need to compute the derivatives of our loss function with respect to parameters
• $\frac{\partial loss}{\partial w}$ and $\frac{\partial loss}{\partial b}$, under some fixed values of x and y
• To compute those derivatives, we call loss.backward(), and then retrieve the values from w.grad and b.grad
loss.backward()

tensor([[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027]])
tensor([0.2661, 0.1179, 0.0027])

x = torch.ones(5)  # input tensor
y = torch.zeros(3)  # expected output

torch.manual_seed(seed)

z = torch.matmul(x, w)+b

loss = torch.nn.functional.binary_cross_entropy_with_logits(z, y)

loss.backward()



tensor([[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027],
[0.2661, 0.1179, 0.0027]])


tensor([0.2661, 0.1179, 0.0027])

• We can only obtain the grad properties for the leaf nodes of the computational graph, which have requires_grad property set to True.

• For all other nodes in our graph, gradients will not be available.

• We can only perform gradient calculations using backward once on a given graph, for performance reasons.

• If we need to do several backward calls on the same graph, we need to pass retain_graph=True to the backward call.

• By default, all tensors with requires_grad=True are tracking their computational history and support gradient computation.

• We can stop tracking computations by surrounding our computation code with torch.no_grad() block

• Reasons to disable gradient tracking

• To mark some parameters in your neural network as frozen parameters. This is a very common scenario for finetuning a pretrained network
• To speed up computations when you are only doing forward pass, because computations on tensors that do not track gradients would be more efficient
z = torch.matmul(x, w)+b

z = torch.matmul(x, w)+b


# Another way to disable gradient
z = torch.matmul(x, w)+b

z_det = z.detach()