Module 2 · PyTorch Deep Learning

Module 2: PyTorch Autograd Engine

Understand Autograd, automatic differentiation system, gradient accumulation, computation graphs DAGs, and gradient calculations.

⏱ 18 Min Read • Author: GenAIWallah Team • Updated: May 2026

Day 3

Automatic Differentiation Engine (Autograd)

Why this matters

Autograd: Autograd tracks operations so gradients flow automatically — no manual backprop for each layer.

Autograd records operations on tensors with requires_grad=True and computes gradients via reverse-mode automatic differentiation.

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)  # tensor(4.)  — dy/dx = 2x

Leaf tensors (parameters) accumulate .grad after .backward().
Use with torch.no_grad(): for inference — saves memory and compute.
.detach() stops gradient flow through a tensor.

Common mistakes

Forgetting optimizer.zero_grad() so gradients accumulate across batches.
Tensor shape mismatches (especially batch/channel dimensions for CNNs).
Training on GPU but leaving tensors on CPU (or vice versa).

Interview checkpoints

Q: Explain autograd in PyTorch. A: One-sentence definition + shape/device note.
Q: Common bug? A: Gradients, shapes, or device mismatch.

Practice

Basic: Define Autograd and sketch a minimal code snippet.
Intermediate: Run a notebook cell demonstrating Autograd.
Advanced: Intentionally break Autograd and interpret the error.

Recap

You can explain autograd clearly.
You know one mistake to avoid.
You see how this connects to the next lesson.

Next: Backward Pass

Day 4

The Computational Graph and Backward Pass

Why this matters

Backward Pass: The backward pass applies the chain rule through the DAG to update weights via optimizers.

Each forward op adds a node to the computational DAG. loss.backward() traverses it in reverse, applying the chain rule (vector-Jacobian products).

Mini example

For loss = (w * x + b - y)^2, gradients w.r.t. w and b flow from the loss node back through multiply and add nodes.

Only scalar losses call .backward() without arguments; for vector outputs use .backward(gradient=...).

Common mistakes

Forgetting optimizer.zero_grad() so gradients accumulate across batches.
Tensor shape mismatches (especially batch/channel dimensions for CNNs).
Training on GPU but leaving tensors on CPU (or vice versa).

Interview checkpoints

Q: Explain backward pass in PyTorch. A: One-sentence definition + shape/device note.
Q: Common bug? A: Gradients, shapes, or device mismatch.

Practice

Basic: Define Backward Pass and sketch a minimal code snippet.
Intermediate: Run a notebook cell demonstrating Backward Pass.
Advanced: Intentionally break Backward Pass and interpret the error.

Recap

You can explain backward pass clearly.
You know one mistake to avoid.
You see how this connects to the next lesson.

Next: torch.nn

← Module 1: Tensors Basics Module 3: torch.nn Pipeline →