Module 3 · PyTorch Deep Learning

Module 3: torch.nn & Training Pipeline

Build deep learning networks with torch.nn module, defining activation functions, forward pathways, loss criterions, and loops.

⏱ 22 Min Read • Author: GenAIWallah Team • Updated: May 2026

Day 5

Creating Models using torch.nn

Why this matters

torch.nn: nn.Module organizes layers with parameters; forward() defines the computation.

Subclass nn.Module or stack layers with nn.Sequential. Learnable weights are nn.Parameter objects registered automatically.

import torch.nn as nn

layer = nn.Linear(784, 10)  # in_features, out_features
x = torch.randn(32, 784)
out = layer(x)  # shape (32, 10)

nn.Linear(in, out) — fully connected layer.
nn.ReLU(), nn.Sigmoid() — activations.
model.parameters() — iterator for optimizers.

Common mistakes

Forgetting optimizer.zero_grad() so gradients accumulate across batches.
Tensor shape mismatches (especially batch/channel dimensions for CNNs).
Training on GPU but leaving tensors on CPU (or vice versa).

Interview checkpoints

Q: Explain torch.nn in PyTorch. A: One-sentence definition + shape/device note.
Q: Common bug? A: Gradients, shapes, or device mismatch.

Practice

Basic: Define torch.nn and sketch a minimal code snippet.
Intermediate: Run a notebook cell demonstrating torch.nn.
Advanced: Intentionally break torch.nn and interpret the error.

Recap

You can explain torch.nn clearly.
You know one mistake to avoid.
You see how this connects to the next lesson.

Next: Training Loop

Day 6

The Core Training Loop Lifecycle

Why this matters

Training Loop: The training loop ties together data, forward, loss, backward, and optimizer.step() — the core DL ritual.

The standard training loop repeats: fetch batch → forward → loss → backward → step.

import torch.nn as nn
import torch.optim as optim

model = nn.Sequential(nn.Linear(10, 1))
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=1e-3)

for epoch in range(10):
    optimizer.zero_grad()
    preds = model(X_batch)
    loss = criterion(preds, y_batch)
    loss.backward()
    optimizer.step()

Always call optimizer.zero_grad() (or zero_grad(set_to_none=True)) before backward() to clear old gradients.

Common mistakes

Forgetting optimizer.zero_grad() so gradients accumulate across batches.
Tensor shape mismatches (especially batch/channel dimensions for CNNs).
Training on GPU but leaving tensors on CPU (or vice versa).

Interview checkpoints

Q: Explain training loop in PyTorch. A: One-sentence definition + shape/device note.
Q: Common bug? A: Gradients, shapes, or device mismatch.

Practice

Basic: Define Training Loop and sketch a minimal code snippet.
Intermediate: Run a notebook cell demonstrating Training Loop.
Advanced: Intentionally break Training Loop and interpret the error.

Recap

You can explain training loop clearly.
You know one mistake to avoid.
You see how this connects to the next lesson.

Next: Dataset

← Module 2: Autograd Module 4: Dataset & DataLoader →