Module 3: torch.nn & Training Pipeline
Build deep learning networks with torch.nn module, defining activation functions, forward pathways, loss criterions, and loops.
Day 5
Creating Models using torch.nn
Why this matters
torch.nn: nn.Module organizes layers with parameters; forward() defines the computation.
Subclass nn.Module or stack layers with nn.Sequential. Learnable weights are nn.Parameter objects registered automatically.
import torch.nn as nn
layer = nn.Linear(784, 10) # in_features, out_features
x = torch.randn(32, 784)
out = layer(x) # shape (32, 10)nn.Linear(in, out)— fully connected layer.nn.ReLU(),nn.Sigmoid()— activations.model.parameters()— iterator for optimizers.
Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain torch.nn in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define torch.nn and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating torch.nn.
- Advanced: Intentionally break torch.nn and interpret the error.
Recap
- You can explain torch.nn clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: Training Loop
Day 6
The Core Training Loop Lifecycle
Why this matters
Training Loop: The training loop ties together data, forward, loss, backward, and optimizer.step() — the core DL ritual.
The standard training loop repeats: fetch batch → forward → loss → backward → step.
import torch.nn as nn
import torch.optim as optim
model = nn.Sequential(nn.Linear(10, 1))
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)
for epoch in range(10):
optimizer.zero_grad()
preds = model(X_batch)
loss = criterion(preds, y_batch)
loss.backward()
optimizer.step()Always call optimizer.zero_grad() (or zero_grad(set_to_none=True)) before backward() to clear old gradients.
Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain training loop in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define Training Loop and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating Training Loop.
- Advanced: Intentionally break Training Loop and interpret the error.
Recap
- You can explain training loop clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: Dataset
