Module 5: GPU Acceleration & CUDA
Move models to GPU using CUDA and Apple Silicon MPS device engines. Understand CPU and GPU data pipes optimization.
Day 9
CPU vs. GPU Compute Accelerations
Why this matters
CPU vs GPU: GPUs accelerate dense matrix math; training large models on CPU is often impractical.
CPUs excel at branching logic; GPUs excel at parallel matrix operations — the core of deep learning.
import torch
print(torch.cuda.is_available())
print(torch.cuda.device_count())| CPU training | GPU training |
|---|---|
| Small models, debugging | Large models, big batches |
| No CUDA setup | Needs NVIDIA driver + PyTorch CUDA build |
Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain cpu vs gpu in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define CPU vs GPU and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating CPU vs GPU.
- Advanced: Intentionally break CPU vs GPU and interpret the error.
Recap
- You can explain cpu vs gpu clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: CUDA Devices
Day 10
Moving Models and Tensors to Devices
Why this matters
CUDA Devices: Explicit device management (.to('cuda')) prevents silent CPU/GPU tensor mismatches.
Move models and batches to the active device explicitly.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
for x, y in loader:
x, y = x.to(device), y.to(device)
# forward / backward on GPUCheck
torch.cuda.is_available() and log torch.cuda.get_device_name(0) at job start.Common mistakes
- Forgetting optimizer.zero_grad() so gradients accumulate across batches.
- Tensor shape mismatches (especially batch/channel dimensions for CNNs).
- Training on GPU but leaving tensors on CPU (or vice versa).
Interview checkpoints
- Q: Explain cuda devices in PyTorch. A: One-sentence definition + shape/device note.
- Q: Common bug? A: Gradients, shapes, or device mismatch.
Practice
- Basic: Define CUDA Devices and sketch a minimal code snippet.
- Intermediate: Run a notebook cell demonstrating CUDA Devices.
- Advanced: Intentionally break CUDA Devices and interpret the error.
Recap
- You can explain cuda devices clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: Optimizers
