Search topics…
Tutorials
Explore
June 6 Offline Event →
Module 2 · PyTorch Deep Learning

Module 2: PyTorch Autograd Engine

Understand Autograd, automatic differentiation system, gradient accumulation, computation graphs DAGs, and gradient calculations.

⏱ 18 Min Read Author: GenAIWallah Team Updated: May 2026
Day 3

Automatic Differentiation Engine (Autograd)

Why this matters

Autograd: Autograd tracks operations so gradients flow automatically — no manual backprop for each layer.

Autograd records operations on tensors with requires_grad=True and computes gradients via reverse-mode automatic differentiation.

x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad)  # tensor(4.)  — dy/dx = 2x
  • Leaf tensors (parameters) accumulate .grad after .backward().
  • Use with torch.no_grad(): for inference — saves memory and compute.
  • .detach() stops gradient flow through a tensor.

Common mistakes

  • Forgetting optimizer.zero_grad() so gradients accumulate across batches.
  • Tensor shape mismatches (especially batch/channel dimensions for CNNs).
  • Training on GPU but leaving tensors on CPU (or vice versa).

Interview checkpoints

  • Q: Explain autograd in PyTorch. A: One-sentence definition + shape/device note.
  • Q: Common bug? A: Gradients, shapes, or device mismatch.

Practice

  1. Basic: Define Autograd and sketch a minimal code snippet.
  2. Intermediate: Run a notebook cell demonstrating Autograd.
  3. Advanced: Intentionally break Autograd and interpret the error.

Recap

  • You can explain autograd clearly.
  • You know one mistake to avoid.
  • You see how this connects to the next lesson.

Next: Backward Pass

Day 4

The Computational Graph and Backward Pass

Why this matters

Backward Pass: The backward pass applies the chain rule through the DAG to update weights via optimizers.

Each forward op adds a node to the computational DAG. loss.backward() traverses it in reverse, applying the chain rule (vector-Jacobian products).

Mini example

For loss = (w * x + b - y)^2, gradients w.r.t. w and b flow from the loss node back through multiply and add nodes.

Only scalar losses call .backward() without arguments; for vector outputs use .backward(gradient=...).

Common mistakes

  • Forgetting optimizer.zero_grad() so gradients accumulate across batches.
  • Tensor shape mismatches (especially batch/channel dimensions for CNNs).
  • Training on GPU but leaving tensors on CPU (or vice versa).

Interview checkpoints

  • Q: Explain backward pass in PyTorch. A: One-sentence definition + shape/device note.
  • Q: Common bug? A: Gradients, shapes, or device mismatch.

Practice

  1. Basic: Define Backward Pass and sketch a minimal code snippet.
  2. Intermediate: Run a notebook cell demonstrating Backward Pass.
  3. Advanced: Intentionally break Backward Pass and interpret the error.

Recap

  • You can explain backward pass clearly.
  • You know one mistake to avoid.
  • You see how this connects to the next lesson.

Next: torch.nn

← Module 1: Tensors Basics Module 3: torch.nn Pipeline →