Search topics…
Tutorials
Explore
June 6 Offline Event →
Module 1 · 100 Days of DL

Module 1: Deep Learning Foundations & Perceptrons

Master Deep Learning foundations: compare with Machine Learning, trace the Perceptron model biology and logic, Rosenblatt's step function, and binary classification boundaries.

⏱ 45 Min Read Author: GenAIWallah Team Updated: May 2026
Day 1

What is DL?

Why this matters

You must know where deep learning sits in the AI stack and when representation learning beats classical ML — this frames every architecture choice later.

Deep Learning (DL) is a subfield of Machine Learning that learns hierarchical representations from data using neural networks with many layers. Instead of hand-engineering features, the model discovers useful features automatically.

AI → ML → DL

  • Artificial Intelligence: Any system that mimics intelligent behavior (rules, search, ML, DL).
  • Machine Learning: Learns patterns from data without explicit rules for every case.
  • Deep Learning: ML with deep neural networks — especially strong on images, audio, and text.
AspectMachine LearningDeep Learning
FeaturesOften manual (domain expertise)Learned automatically from raw inputs
DataWorks on smaller structured datasetsNeeds large datasets; shines at scale
ComputeCPU is often enoughGPUs/TPUs for parallel matrix math
InterpretabilityEasier with linear models, treesHarder — black-box tradeoffs
Rule of thumb: Start with classical ML on tabular data; consider DL when you have large-scale images, audio, video, or unstructured text.

Common mistakes

  • Using deep learning on tiny tabular data where gradient boosting wins.
  • Assuming more layers always help without enough data or regularization.
  • Ignoring compute cost (GPU memory, training time) in project planning.

Interview checkpoints

  • Q: ML vs DL in one line? A: DL learns hierarchical features automatically; classical ML often needs hand-crafted features.
  • Q: Why did DL take off after 2012? A: Big data + GPUs + better activations/optimizers + breakthrough architectures.
  • Q: When not to use DL? A: Small data, strict interpretability, or simple rules suffice.

Practice

  1. Basic: List 3 DL applications and the input type (image, text, audio).
  2. Intermediate: Draw the feature-engineering pipeline for ML vs end-to-end DL.
  3. Advanced: Argue DL vs ML for a 5k-row fraud dataset with 40 features.

Recap

  • DL learns representations from raw data using stacked nonlinear layers.
  • It needs data, compute, and careful regularization.
  • Choose DL when signal is in high-dimensional raw inputs.

Next: Day 2 — Biological Neurons

Day 2

Biological Neurons

Why this matters

The biological metaphor explains why networks use weighted sums, thresholds, and layers — it makes perceptrons intuitive, not magical.

Biological neurons inspired early neural network design. An artificial neuron is a simplified mathematical unit — not a literal copy of biology, but a useful mental model.

Biological neuron (simplified)

  • Dendrites receive signals from other neurons.
  • The cell body integrates incoming signals.
  • If activation exceeds a threshold, the axon fires an output signal.

Artificial neuron mapping

BiologyArtificial model
Inputs from other cellsFeature values \(x_1, x_2, \ldots\)
Synaptic strengthWeights \(w_i\)
Resting potential / thresholdBias \(b\)
Fire or notActivation function (e.g. step, ReLU)
Biological vs. Artificial Perceptron
x₁ x₂ Σ y

Common mistakes

  • Treating artificial neurons as literal copies of biology (they are abstractions).
  • Forgetting that dendrites map to inputs and axon to output.
  • Ignoring that biological spikes are not identical to ReLU outputs.

Interview checkpoints

  • Q: What does a neuron compute? A: Weighted sum of inputs plus bias, then an activation function.
  • Q: Why bias matters? A: Shifts the decision boundary without changing input weights.

Practice

  1. Basic: Label inputs, weights, bias, activation on a neuron diagram.
  2. Intermediate: Compare integrate-and-fire vs perceptron step activation.
  3. Advanced: Explain why biological plausibility is not required for useful ANNs.

Recap

  • ANNs are inspired by biology but optimized for math and GPUs.
  • Weighted sum + activation is the universal building block.
  • Next: formal perceptron model.

Next: Day 3 — Perceptron Model

Day 3

Perceptron Model

Why this matters

The perceptron is the simplest trainable classifier — mastering it explains weights, bias, and linear decision boundaries before MLPs.

The perceptron (Frank Rosenblatt, 1958) is a linear binary classifier: it computes a weighted sum and applies a step function to produce 0 or 1.

Perceptron output

$$z = \sum_{i=1}^{n} w_i x_i + b, \quad \hat{y} = \begin{cases} 1 & z \geq 0 \\ 0 & z < 0 \end{cases}$$

Worked example

Let \(x_1=2, x_2=1\), \(w_1=1, w_2=-1\), \(b=0\). Then \(z = 2 - 1 = 1 \geq 0\), so prediction is 1. The decision boundary is the line \(x_1 = x_2\).

What a single perceptron can learn

Only linearly separable patterns (AND, OR gates). It cannot solve XOR without hidden layers — that limitation motivated multi-layer networks.

Common mistakes

  • Omitting bias and wondering why the boundary must pass through origin.
  • Confusing pre-activation score z with output prediction.
  • Using perceptron on multi-class without one-vs-rest strategy.

Interview checkpoints

  • Q: Perceptron output rule? A: y = 1 if w·x + b ≥ 0 else 0 (with step activation).
  • Q: What can a single perceptron learn? A: Only linearly separable patterns.

Practice

  1. Basic: Compute perceptron output for w=[1,-1], b=0, x=[2,1].
  2. Intermediate: Plot the decision boundary for 2D weights.
  3. Advanced: Implement perceptron update rule in NumPy on a toy dataset.

Recap

  • Perceptron = linear classifier + step activation.
  • Geometry: hyperplane divides feature space.
  • Single layer cannot solve XOR.

Next: Day 4 — Step Activation

Day 4

Step Activation

Why this matters

Activation functions introduce nonlinearity — without them, stacked layers collapse to one linear map.

Activation functions introduce nonlinearity. Without them, stacking layers would still be one big linear transformation.

Common activations

FunctionFormula / ruleTypical use
Step1 if \(z \geq 0\) else 0Historical perceptron
Sigmoid\(\sigma(z) = 1/(1+e^{-z})\)Binary output (probability)
TanhOutput in \((-1, 1)\), zero-centeredHidden layers (older nets)
ReLU\(\max(0, z)\)Default for hidden layers today
⚠️
Failure mode

Using sigmoid in many deep hidden layers causes vanishing gradients — training stalls. Prefer ReLU in hidden stacks.

Common mistakes

  • Using step function in deep networks (zero gradient almost everywhere).
  • Applying softmax on hidden layers instead of output for classification.
  • Mixing up activation output range and loss function expectations.

Interview checkpoints

  • Q: Why not linear activation in hidden layers? A: Composition of linear maps is still linear.
  • Q: Step vs sigmoid? A: Sigmoid is differentiable; step is not (historical perceptron only).

Practice

  1. Basic: Sketch step, sigmoid, ReLU on the same axis.
  2. Intermediate: Identify vanishing gradient risk for sigmoid in deep nets.
  3. Advanced: Pick an activation for output layer on binary vs multi-class tasks.

Recap

  • Activations enable nonlinear decision surfaces.
  • Step is for understanding; smooth activations train deep nets.
  • Match activation to loss (sigmoid+BCE, softmax+CE).

Next: Day 5 — Perceptron Learning Rule

Day 5

Perceptron Learning Rule

Why this matters

The perceptron learning rule is the ancestor of gradient descent — it shows how errors drive weight updates geometrically.

The perceptron learning rule updates weights only when the model misclassifies a training example.

Update rule

$$w_i \leftarrow w_i + \eta (y - \hat{y}) x_i, \quad b \leftarrow b + \eta (y - \hat{y})$$

\(\eta\) = learning rate. Update only when \(y \neq \hat{y}\).

Convergence

If the data is linearly separable, the algorithm converges in finite steps. On noisy or non-separable data, it may never settle — use logistic regression or an MLP instead.

Common mistakes

  • Updating weights when prediction is correct (wastes steps).
  • Learning rate too large causing oscillation on separable data.
  • Expecting convergence on non-separable noisy data.

Interview checkpoints

  • Q: Perceptron update when wrong? A: w ← w + η(y − ŷ)x (and bias similarly).
  • Q: Convergence guarantee? A: Only if data is linearly separable.

Practice

  1. Basic: Apply one manual update step on a misclassified point.
  2. Intermediate: Train perceptron until convergence on AND gate data.
  3. Advanced: Show failure on XOR with single perceptron.

Recap

  • Mistakes push the boundary toward correctly classified region.
  • Converges only for linearly separable sets.
  • XOR needs hidden layer (MLP).

Next: Day 6 — XOR Problem

Day 6

XOR Problem

Why this matters

XOR is the famous proof that shallow linear models fail — it motivated multi-layer networks and modern deep learning.

The XOR problem proved that a single perceptron cannot learn non-linear boundaries — a key moment that led to the first AI winter and later to multi-layer networks.

XOR truth table

x₁x₂y (XOR)
000
011
101
110

No single straight line separates the 1s from the 0s in 2D. You need at least one hidden layer with a nonlinear activation so the network can bend the boundary.

Intuition

AND and OR are linearly separable; XOR is not. That is why depth matters — not more neurons alone, but composition of nonlinear layers.

Common mistakes

  • Thinking more epochs will make single-layer perceptron learn XOR.
  • Not visualizing why no single line separates XOR classes.
  • Skipping to deep nets without understanding why depth helps.

Interview checkpoints

  • Q: Why XOR breaks perceptron? A: Not linearly separable in 2D input space.
  • Q: Minimal fix? A: Add hidden layer with nonlinear activation (MLP).

Practice

  1. Basic: Draw XOR points and show no single line works.
  2. Intermediate: Add one hidden unit and sketch new boundary idea.
  3. Advanced: Train 2-layer MLP in Keras on XOR.

Recap

  • XOR requires hidden representations.
  • This limitation caused the first AI winter.
  • MLPs solve it with depth + nonlinearity.

Next: Day 7 — Linear Separability

Day 7

Linear Separability

Why this matters

Linear separability tells you whether a single layer suffices — essential before choosing model depth.

A dataset is linearly separable if there exists a hyperplane that separates the classes with zero training error (for a linear classifier).

How to check (2D)

  • Plot points colored by class.
  • Try to draw one straight line (or curve for kernel methods) separating them.
  • If convex hulls of classes overlap in a way that forbids a line — not linearly separable.

Support Vector Machines find the maximum-margin separator. A perceptron finds a separator if one exists, but need not be optimal margin.

Common mistakes

  • Checking separability in raw space when features should be transformed first.
  • Confusing linear separability with linear regression assumptions.
  • Ignoring soft-margin SVM as alternative to perceptron.

Interview checkpoints

  • Q: Define linear separability. A: ∃ hyperplane that separates classes perfectly.
  • Q: Test in 2D? A: Try to draw a line; convex hulls disjoint ⇒ separable.

Practice

  1. Basic: Classify 4 small 2D datasets as separable or not.
  2. Intermediate: Use sklearn LinearSVC vs Perceptron on same data.
  3. Advanced: Kernel trick intuition: separable in higher dimension.

Recap

  • One neuron = one hyperplane.
  • Non-separable ⇒ need features, depth, or kernels.
  • Always visualize 2D/3D when possible.

Next: Day 8 — Decision Boundaries

Day 8

Decision Boundaries

Why this matters

Decision boundaries connect math to intuition — you debug models by seeing where they flip predictions.

The decision boundary is where the model's score equals the threshold (0 for perceptron). In 2D it is a line; in higher dimensions, a hyperplane.

  • Perceptron: One hyperplane — simple but limited.
  • MLP with ReLU: Piecewise-linear boundaries — can approximate complex shapes.
  • Overfitting: Overly wiggly boundaries on training data often fail on validation data.

Common mistakes

  • Plotting boundaries without scaling features (distorted geometry).
  • Ignoring that ReLU nets create piecewise-linear boundaries.
  • Only looking at accuracy, not boundary complexity (overfitting).

Interview checkpoints

  • Q: Effect of more hidden units on boundary? A: More pieces, more complex shapes.
  • Q: L2 regularization effect? A: Simpler, smoother boundaries, less overfit.

Practice

  1. Basic: Sketch boundary for AND, OR, XOR.
  2. Intermediate: Plot 2D decision regions for small MLP.
  3. Advanced: Compare boundaries: perceptron vs 2-layer ReLU MLP.

Recap

  • Boundaries visualize what the model learned.
  • Depth increases boundary complexity.
  • Regularization keeps boundaries sane.

Next: Day 9 — DL vs ML

Day 9

DL vs ML

Why this matters

Teams waste money choosing DL when sklearn suffices — this day is the decision framework for real projects.

Choosing ML vs DL saves time, money, and reliability. Not every problem needs a neural network.

ScenarioPreferWhy
5k rows, 40 tabular features, fraud detectionML (XGBoost, logistic)Less data, need interpretability
1M labeled images, object detectionDL (CNN + transfer learning)Raw pixels, representation learning
Small text dataset, intent classificationStart ML; try fine-tuned BERT if neededBaseline first, then scale model

Common mistakes

  • Defaulting to ResNet for 500 labeled tabular rows.
  • Skipping baselines (logistic regression, XGBoost) before CNNs.
  • Underestimating labeling cost for DL data hunger.

Interview checkpoints

  • Q: Image 1M labels vs 500 tabular rows — pick? A: CNN/transfer vs boosted trees.
  • Q: Interpretability requirement? A: Favor classical ML or explainable models.

Practice

  1. Basic: For 5 scenarios, pick ML or DL and justify in one sentence.
  2. Intermediate: Build sklearn baseline then DL model; compare metric/cost.
  3. Advanced: Write a one-page model selection memo for a startup use case.

Recap

  • DL wins on raw high-dimensional data at scale.
  • ML wins on small structured data and interpretability.
  • Always baseline simple first.

Next: Day 10 — Keras & TensorFlow Setup

Day 10

Keras & TensorFlow Setup

Why this matters

A reproducible Keras/TF environment prevents silent GPU/CPU bugs and version skew across the rest of the 100 days.

Set up TensorFlow and Keras once correctly — wrong CUDA versions and missing GPU detection cause hours of debugging later.

Python
import tensorflow as tf
print("TF version:", tf.__version__)
print("GPUs:", tf.config.list_physical_devices("GPU"))

from tensorflow import keras
from tensorflow.keras import layers

# Quick sanity check on MNIST subset
(x_train, y_train), _ = keras.datasets.mnist.load_data()
x_train = x_train[:5000].reshape(-1, 784).astype("float32") / 255.0

model = keras.Sequential([
    layers.Dense(128, activation="relu", input_shape=(784,)),
    layers.Dense(10, activation="softmax"),
])
model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
model.fit(x_train, y_train[:5000], epochs=3, batch_size=128, validation_split=0.1)
model.summary()
  • Pin versions in requirements.txt.
  • Verify GPU with nvidia-smi and TensorFlow device list.
  • Use model.summary() before every long training run.

Common mistakes

  • Installing TF GPU without matching CUDA/cuDNN versions.
  • Not pinning package versions in requirements.txt.
  • Running training on CPU while thinking GPU is active.

Interview checkpoints

  • Q: Check GPU visible in TF? A: tf.config.list_physical_devices('GPU') or nvidia-smi.
  • Q: Keras 3 backend? A: Can use TF, JAX, or PyTorch as backend — know your install.

Practice

  1. Basic: Install TF/Keras; run Hello tensor addition.
  2. Intermediate: Train a 2-layer MLP on MNIST subset in <2 min.
  3. Advanced: Dockerfile with pinned TF-GPU for reproducible training.

Recap

  • Verify GPU and versions before big experiments.
  • Keras Sequential API is enough for early modules.
  • Ready for Module 2: MLPs.

Next: Day 11 — MLP Architecture

← Back to DL Hub Module 2: MLPs & Training →