Module 6: Optimization & Tuning
Optimize model loss via Adam and SGD. Configure decay rates, learning rate scheduler steps, dropout matrices, and batchnorm.
Day 11
Optimization Algorithms: SGD vs. Adam
Why this matters
Optimizers: Adam adapts learning rates per parameter; SGD with momentum remains a strong baseline.
SGD updates weights along negative gradient; Adam maintains per-parameter momentum and variance estimates (adaptive LR).
- Start with Adam (
lr=1e-3) for quick experiments. - Learning rate schedulers (
StepLR,CosineAnnealingLR) improve convergence. weight_decayin AdamW adds L2 regularization.
Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain optimizers in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define Optimizers and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating Optimizers.
- Advanced: Intentionally break Optimizers and interpret the error.
Recap
- You can explain optimizers clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: Regularization
Day 12
Regularization and Tuning Techniques
Why this matters
Regularization: Dropout and BatchNorm reduce overfitting and stabilize training — standard in vision models.
Dropout randomly zeros activations during training (disabled in model.eval()). BatchNorm normalizes activations per channel using batch statistics.
- Use
model.train()vsmodel.eval()to toggle dropout/BN behavior. - Pair with early stopping on validation loss.
Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain regularization in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define Regularization and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating Regularization.
- Advanced: Intentionally break Regularization and interpret the error.
Recap
- You can explain regularization clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: CNN Layers
