Module 1 · 100 Days of DL

Module 1: Deep Learning Foundations & Perceptrons

Master Deep Learning foundations: compare with Machine Learning, trace the Perceptron model biology and logic, Rosenblatt's step function, and binary classification boundaries.

⏱ 45 Min Read• Author: GenAIWallah Team• Updated: May 2026

Day 1

What is DL?

Why this matters

You must know where deep learning sits in the AI stack and when representation learning beats classical ML — this frames every architecture choice later.

Deep Learning (DL) is a subfield of Machine Learning that learns hierarchical representations from data using neural networks with many layers. Instead of hand-engineering features, the model discovers useful features automatically.

AI → ML → DL

Artificial Intelligence: Any system that mimics intelligent behavior (rules, search, ML, DL).
Machine Learning: Learns patterns from data without explicit rules for every case.
Deep Learning: ML with deep neural networks — especially strong on images, audio, and text.

Aspect	Machine Learning	Deep Learning
Features	Often manual (domain expertise)	Learned automatically from raw inputs
Data	Works on smaller structured datasets	Needs large datasets; shines at scale
Compute	CPU is often enough	GPUs/TPUs for parallel matrix math
Interpretability	Easier with linear models, trees	Harder — black-box tradeoffs

Rule of thumb: Start with classical ML on tabular data; consider DL when you have large-scale images, audio, video, or unstructured text.

Common mistakes

Using deep learning on tiny tabular data where gradient boosting wins.
Assuming more layers always help without enough data or regularization.
Ignoring compute cost (GPU memory, training time) in project planning.

Interview checkpoints

Q: ML vs DL in one line? A: DL learns hierarchical features automatically; classical ML often needs hand-crafted features.
Q: Why did DL take off after 2012? A: Big data + GPUs + better activations/optimizers + breakthrough architectures.
Q: When not to use DL? A: Small data, strict interpretability, or simple rules suffice.

Practice

Basic: List 3 DL applications and the input type (image, text, audio).
Intermediate: Draw the feature-engineering pipeline for ML vs end-to-end DL.
Advanced: Argue DL vs ML for a 5k-row fraud dataset with 40 features.

Recap

DL learns representations from raw data using stacked nonlinear layers.
It needs data, compute, and careful regularization.
Choose DL when signal is in high-dimensional raw inputs.

Next: Day 2 — Biological Neurons

Day 2

Biological Neurons

Why this matters

The biological metaphor explains why networks use weighted sums, thresholds, and layers — it makes perceptrons intuitive, not magical.

Biological neurons inspired early neural network design. An artificial neuron is a simplified mathematical unit — not a literal copy of biology, but a useful mental model.

Biological neuron (simplified)

Dendrites receive signals from other neurons.
The cell body integrates incoming signals.
If activation exceeds a threshold, the axon fires an output signal.

Artificial neuron mapping

Biology	Artificial model
Inputs from other cells	Feature values $x_1, x_2, \ldots$
Synaptic strength	Weights $w_i$
Resting potential / threshold	Bias $b$
Fire or not	Activation function (e.g. step, ReLU)

Biological vs. Artificial Perceptron

Common mistakes

Treating artificial neurons as literal copies of biology (they are abstractions).
Forgetting that dendrites map to inputs and axon to output.
Ignoring that biological spikes are not identical to ReLU outputs.

Interview checkpoints

Q: What does a neuron compute? A: Weighted sum of inputs plus bias, then an activation function.
Q: Why bias matters? A: Shifts the decision boundary without changing input weights.

Practice

Basic: Label inputs, weights, bias, activation on a neuron diagram.
Intermediate: Compare integrate-and-fire vs perceptron step activation.
Advanced: Explain why biological plausibility is not required for useful ANNs.

Recap

ANNs are inspired by biology but optimized for math and GPUs.
Weighted sum + activation is the universal building block.
Next: formal perceptron model.

Next: Day 3 — Perceptron Model

Day 3

Perceptron Model

Why this matters

The perceptron is the simplest trainable classifier — mastering it explains weights, bias, and linear decision boundaries before MLPs.

The perceptron (Frank Rosenblatt, 1958) is a linear binary classifier: it computes a weighted sum and applies a step function to produce 0 or 1.

Perceptron output

$$z = \sum_{i=1}^{n} w_i x_i + b, \quad \hat{y} = \begin{cases} 1 & z \geq 0 \\ 0 & z < 0 \end{cases}$$

Worked example

Let $x_1=2, x_2=1$, $w_1=1, w_2=-1$, $b=0$. Then $z = 2 - 1 = 1 \geq 0$, so prediction is 1. The decision boundary is the line $x_1 = x_2$.

What a single perceptron can learn

Only linearly separable patterns (AND, OR gates). It cannot solve XOR without hidden layers — that limitation motivated multi-layer networks.

Common mistakes

Omitting bias and wondering why the boundary must pass through origin.
Confusing pre-activation score z with output prediction.
Using perceptron on multi-class without one-vs-rest strategy.

Interview checkpoints

Q: Perceptron output rule? A: y = 1 if w·x + b ≥ 0 else 0 (with step activation).
Q: What can a single perceptron learn? A: Only linearly separable patterns.

Practice

Basic: Compute perceptron output for w=[1,-1], b=0, x=[2,1].
Intermediate: Plot the decision boundary for 2D weights.
Advanced: Implement perceptron update rule in NumPy on a toy dataset.

Recap

Perceptron = linear classifier + step activation.
Geometry: hyperplane divides feature space.
Single layer cannot solve XOR.

Next: Day 4 — Step Activation

Day 4

Step Activation

Why this matters

Activation functions introduce nonlinearity — without them, stacked layers collapse to one linear map.

Activation functions introduce nonlinearity. Without them, stacking layers would still be one big linear transformation.

Common activations

Function	Formula / rule	Typical use
Step	1 if $z \geq 0$ else 0	Historical perceptron
Sigmoid	$\sigma(z) = 1/(1+e^{-z})$	Binary output (probability)
Tanh	Output in $(-1, 1)$, zero-centered	Hidden layers (older nets)
ReLU	$\max(0, z)$	Default for hidden layers today

⚠️

Failure mode

Using sigmoid in many deep hidden layers causes vanishing gradients — training stalls. Prefer ReLU in hidden stacks.

Common mistakes

Using step function in deep networks (zero gradient almost everywhere).
Applying softmax on hidden layers instead of output for classification.
Mixing up activation output range and loss function expectations.

Interview checkpoints

Q: Why not linear activation in hidden layers? A: Composition of linear maps is still linear.
Q: Step vs sigmoid? A: Sigmoid is differentiable; step is not (historical perceptron only).

Practice

Basic: Sketch step, sigmoid, ReLU on the same axis.
Intermediate: Identify vanishing gradient risk for sigmoid in deep nets.
Advanced: Pick an activation for output layer on binary vs multi-class tasks.

Recap

Activations enable nonlinear decision surfaces.
Step is for understanding; smooth activations train deep nets.
Match activation to loss (sigmoid+BCE, softmax+CE).

Next: Day 5 — Perceptron Learning Rule

Day 5

Perceptron Learning Rule

Why this matters

The perceptron learning rule is the ancestor of gradient descent — it shows how errors drive weight updates geometrically.

The perceptron learning rule updates weights only when the model misclassifies a training example.

Update rule

$$w_i \leftarrow w_i + \eta (y - \hat{y}) x_i, \quad b \leftarrow b + \eta (y - \hat{y})$$

$\eta$ = learning rate. Update only when $y \neq \hat{y}$.

Convergence

If the data is linearly separable, the algorithm converges in finite steps. On noisy or non-separable data, it may never settle — use logistic regression or an MLP instead.

Common mistakes

Updating weights when prediction is correct (wastes steps).
Learning rate too large causing oscillation on separable data.
Expecting convergence on non-separable noisy data.

Interview checkpoints

Q: Perceptron update when wrong? A: w ← w + η(y − ŷ)x (and bias similarly).
Q: Convergence guarantee? A: Only if data is linearly separable.

Practice

Basic: Apply one manual update step on a misclassified point.
Intermediate: Train perceptron until convergence on AND gate data.
Advanced: Show failure on XOR with single perceptron.

Recap

Mistakes push the boundary toward correctly classified region.
Converges only for linearly separable sets.
XOR needs hidden layer (MLP).

Next: Day 6 — XOR Problem

Day 6

XOR Problem

Why this matters

XOR is the famous proof that shallow linear models fail — it motivated multi-layer networks and modern deep learning.

The XOR problem proved that a single perceptron cannot learn non-linear boundaries — a key moment that led to the first AI winter and later to multi-layer networks.

XOR truth table

x₁	x₂	y (XOR)
0	0	0
0	1	1
1	0	1
1	1	0

No single straight line separates the 1s from the 0s in 2D. You need at least one hidden layer with a nonlinear activation so the network can bend the boundary.

Intuition

AND and OR are linearly separable; XOR is not. That is why depth matters — not more neurons alone, but composition of nonlinear layers.

Common mistakes

Thinking more epochs will make single-layer perceptron learn XOR.
Not visualizing why no single line separates XOR classes.
Skipping to deep nets without understanding why depth helps.

Interview checkpoints

Q: Why XOR breaks perceptron? A: Not linearly separable in 2D input space.
Q: Minimal fix? A: Add hidden layer with nonlinear activation (MLP).

Practice

Basic: Draw XOR points and show no single line works.
Intermediate: Add one hidden unit and sketch new boundary idea.
Advanced: Train 2-layer MLP in Keras on XOR.

Recap

XOR requires hidden representations.
This limitation caused the first AI winter.
MLPs solve it with depth + nonlinearity.

Next: Day 7 — Linear Separability

Day 7

Linear Separability

Why this matters

Linear separability tells you whether a single layer suffices — essential before choosing model depth.

A dataset is linearly separable if there exists a hyperplane that separates the classes with zero training error (for a linear classifier).

How to check (2D)

Plot points colored by class.
Try to draw one straight line (or curve for kernel methods) separating them.
If convex hulls of classes overlap in a way that forbids a line — not linearly separable.

Support Vector Machines find the maximum-margin separator. A perceptron finds a separator if one exists, but need not be optimal margin.

Common mistakes

Checking separability in raw space when features should be transformed first.
Confusing linear separability with linear regression assumptions.
Ignoring soft-margin SVM as alternative to perceptron.

Interview checkpoints

Q: Define linear separability. A: ∃ hyperplane that separates classes perfectly.
Q: Test in 2D? A: Try to draw a line; convex hulls disjoint ⇒ separable.

Practice

Basic: Classify 4 small 2D datasets as separable or not.
Intermediate: Use sklearn LinearSVC vs Perceptron on same data.
Advanced: Kernel trick intuition: separable in higher dimension.

Recap

One neuron = one hyperplane.
Non-separable ⇒ need features, depth, or kernels.
Always visualize 2D/3D when possible.

Next: Day 8 — Decision Boundaries

Day 8

Decision Boundaries

Why this matters

Decision boundaries connect math to intuition — you debug models by seeing where they flip predictions.

The decision boundary is where the model's score equals the threshold (0 for perceptron). In 2D it is a line; in higher dimensions, a hyperplane.

Perceptron: One hyperplane — simple but limited.
MLP with ReLU: Piecewise-linear boundaries — can approximate complex shapes.
Overfitting: Overly wiggly boundaries on training data often fail on validation data.

Common mistakes

Plotting boundaries without scaling features (distorted geometry).
Ignoring that ReLU nets create piecewise-linear boundaries.
Only looking at accuracy, not boundary complexity (overfitting).

Interview checkpoints

Q: Effect of more hidden units on boundary? A: More pieces, more complex shapes.
Q: L2 regularization effect? A: Simpler, smoother boundaries, less overfit.

Practice

Basic: Sketch boundary for AND, OR, XOR.
Intermediate: Plot 2D decision regions for small MLP.
Advanced: Compare boundaries: perceptron vs 2-layer ReLU MLP.

Recap

Boundaries visualize what the model learned.
Depth increases boundary complexity.
Regularization keeps boundaries sane.

Next: Day 9 — DL vs ML

Day 9

DL vs ML

Why this matters

Teams waste money choosing DL when sklearn suffices — this day is the decision framework for real projects.

Choosing ML vs DL saves time, money, and reliability. Not every problem needs a neural network.

Scenario	Prefer	Why
5k rows, 40 tabular features, fraud detection	ML (XGBoost, logistic)	Less data, need interpretability
1M labeled images, object detection	DL (CNN + transfer learning)	Raw pixels, representation learning
Small text dataset, intent classification	Start ML; try fine-tuned BERT if needed	Baseline first, then scale model

Common mistakes

Defaulting to ResNet for 500 labeled tabular rows.
Skipping baselines (logistic regression, XGBoost) before CNNs.
Underestimating labeling cost for DL data hunger.

Interview checkpoints

Q: Image 1M labels vs 500 tabular rows — pick? A: CNN/transfer vs boosted trees.
Q: Interpretability requirement? A: Favor classical ML or explainable models.

Practice

Basic: For 5 scenarios, pick ML or DL and justify in one sentence.
Intermediate: Build sklearn baseline then DL model; compare metric/cost.
Advanced: Write a one-page model selection memo for a startup use case.

Recap

DL wins on raw high-dimensional data at scale.
ML wins on small structured data and interpretability.
Always baseline simple first.

Next: Day 10 — Keras & TensorFlow Setup

Day 10

Keras & TensorFlow Setup

Why this matters

A reproducible Keras/TF environment prevents silent GPU/CPU bugs and version skew across the rest of the 100 days.

Set up TensorFlow and Keras once correctly — wrong CUDA versions and missing GPU detection cause hours of debugging later.

Python

import tensorflow as tf
print("TF version:", tf.__version__)
print("GPUs:", tf.config.list_physical_devices("GPU"))

from tensorflow import keras
from tensorflow.keras import layers

# Quick sanity check on MNIST subset
(x_train, y_train), _ = keras.datasets.mnist.load_data()
x_train = x_train[:5000].reshape(-1, 784).astype("float32") / 255.0

model = keras.Sequential([
    layers.Dense(128, activation="relu", input_shape=(784,)),
    layers.Dense(10, activation="softmax"),
])
model.compile(optimizer="adam",
              loss="sparse_categorical_crossentropy",
              metrics=["accuracy"])
model.fit(x_train, y_train[:5000], epochs=3, batch_size=128, validation_split=0.1)
model.summary()

Pin versions in requirements.txt.
Verify GPU with nvidia-smi and TensorFlow device list.
Use model.summary() before every long training run.

Common mistakes

Installing TF GPU without matching CUDA/cuDNN versions.
Not pinning package versions in requirements.txt.
Running training on CPU while thinking GPU is active.

Interview checkpoints

Q: Check GPU visible in TF? A: tf.config.list_physical_devices('GPU') or nvidia-smi.
Q: Keras 3 backend? A: Can use TF, JAX, or PyTorch as backend — know your install.

Practice

Basic: Install TF/Keras; run Hello tensor addition.
Intermediate: Train a 2-layer MLP on MNIST subset in <2 min.
Advanced: Dockerfile with pinned TF-GPU for reproducible training.

Recap

Verify GPU and versions before big experiments.
Keras Sequential API is enough for early modules.
Ready for Module 2: MLPs.

Next: Day 11 — MLP Architecture

← Back to DL Hub Module 2: MLPs & Training →

Biology	Artificial model
Inputs from other cells	Feature values \(x_1, x_2, \ldots\)
Synaptic strength	Weights \(w_i\)
Resting potential / threshold	Bias \(b\)
Fire or not	Activation function (e.g. step, ReLU)

Function	Formula / rule	Typical use
Step	1 if \(z \geq 0\) else 0	Historical perceptron
Sigmoid	\(\sigma(z) = 1/(1+e^{-z})\)	Binary output (probability)
Tanh	Output in \((-1, 1)\), zero-centered	Hidden layers (older nets)
ReLU	\(\max(0, z)\)	Default for hidden layers today