Module 2: PyTorch Autograd Engine
Understand Autograd, automatic differentiation system, gradient accumulation, computation graphs DAGs, and gradient calculations.
Automatic Differentiation Engine (Autograd)
Why this matters
Autograd: Autograd tracks operations so gradients flow automatically — no manual backprop for each layer.
Autograd records operations on tensors with requires_grad=True and computes gradients via reverse-mode automatic differentiation.
x = torch.tensor(2.0, requires_grad=True)
y = x ** 2
y.backward()
print(x.grad) # tensor(4.) — dy/dx = 2x- Leaf tensors (parameters) accumulate
.gradafter.backward(). - Use
with torch.no_grad():for inference — saves memory and compute. .detach()stops gradient flow through a tensor.
Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain autograd in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define Autograd and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating Autograd.
- Advanced: Intentionally break Autograd and interpret the error.
Recap
- You can explain autograd clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: Backward Pass
The Computational Graph and Backward Pass
Why this matters
Backward Pass: The backward pass applies the chain rule through the DAG to update weights via optimizers.
Each forward op adds a node to the computational DAG. loss.backward() traverses it in reverse, applying the chain rule (vector-Jacobian products).
Mini example
For loss = (w * x + b - y)^2, gradients w.r.t. w and b flow from the loss node back through multiply and add nodes.
.backward() without arguments; for vector outputs use .backward(gradient=...).Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain backward pass in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define Backward Pass and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating Backward Pass.
- Advanced: Intentionally break Backward Pass and interpret the error.
Recap
- You can explain backward pass clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: torch.nn
