Search topics…
Tutorials
Explore
June 6 Offline Event →
CampusX · 100 Days of NLP

Master Natural Language Processing
in 100 Days

Complete 100 Days of NLP curriculum — tokenization, TF-IDF, Word2Vec, text classification, HMMs, and duplicate detection. Free forever.

A structured NLP curriculum — from pipeline foundations and word representations through text classification, sequence labeling, and real-world duplicate detection.

100Days / Lessons
8Core Modules
100+Code Examples
Free Forever
Curriculum Overview

8 Modules to NLP Mastery

Each module builds on the previous — click any module to dive into detailed notes with code, theory, and exercises.

Module 1
Days 1–12

Introduction to NLP & Ambiguity

Understand what NLP is, the core challenges of natural language, and why ambiguity is the central problem.

  • NLP definition and real-world applications
  • Lexical ambiguity — same word, multiple meanings
  • Syntactic ambiguity — multiple parse trees
  • Semantic and pragmatic resolution strategies
  • Overview of NLP tasks: POS, NER, parsing, MT
Start Module 1 →
Module 2
Days 13–25

End-to-End NLP Pipeline

Trace the full lifecycle of an NLP system from raw data acquisition to model deployment.

  • Data scraping and acquisition strategies
  • Text cleaning and noise removal
  • Embedding computation and representation
  • Model training and evaluation workflow
  • Deployment and serving NLP APIs
Start Module 2 →
Module 3
Days 26–38

Text Preprocessing Techniques

Master all standard text normalization steps before feeding text into any model.

  • Tokenization — word, subword, sentence-level
  • Lowercasing and punctuation handling
  • Stopword removal — when to and when not to
  • Stemming vs. Lemmatization — tradeoffs
  • Regex-based cleaning and custom pipelines
Start Module 3 →
Module 4
Days 39–53

Text Vectorization & TF-IDF

Convert raw text into numerical representations that machine learning models can consume.

  • One-Hot Encoding for vocabulary
  • Bag-of-Words (BoW) representation
  • N-grams and co-occurrence matrices
  • TF-IDF — term and document frequency scaling
  • Sparse vs. dense representation tradeoffs
Start Module 4 →
Module 5
Days 54–65

Word Embeddings (Word2Vec)

Learn dense vector representations that capture semantic meaning and word relationships.

  • Limitations of sparse representations
  • CBOW — predicting target from context
  • Skip-gram — predicting context from target
  • Negative sampling for efficient training
  • GloVe and FastText comparisons
Start Module 5 →
Module 6
Days 66–78

Text Classification Models

Build classifiers that label text — from spam detection to sentiment analysis and topic categorization.

  • Naive Bayes — conditional probabilities & Laplace smoothing
  • Logistic Regression for multi-class text
  • Support Vector Machines with text kernels
  • Evaluation: accuracy, precision, recall, F1
  • Handling class imbalance in NLP
Start Module 6 →
Module 7
Days 79–90

POS Tagging & Hidden Markov Models

Model sequential linguistic structure using probabilistic graphical models and dynamic programming.

  • Part-of-Speech tag sets (Penn Treebank)
  • HMM — states, transitions, emissions
  • Forward-Backward algorithm
  • Viterbi decoding for optimal tag sequence
  • Named Entity Recognition (NER) with HMMs
Start Module 7 →
Module 8
Days 91–100

Duplicate Question Detection

End-to-end NLP case study using the Quora Question Pairs dataset — a real-world similarity problem.

  • Problem framing: semantic similarity as binary classification
  • Cosine similarity and Jaccard intersection
  • Fuzzy matching with edit distance
  • Feature engineering from text pairs
  • XGBoost on engineered similarity vectors
Start Module 8 →
Quick Navigation

Jump to Any Topic

All 100 NLP topics mapped below — click to navigate directly.

What is NLP? NLP Applications Lexical Ambiguity Syntactic Ambiguity Semantic Ambiguity Pragmatic Analysis NLP Task Overview NLP vs CV vs RL Language Models Intro Corpora & Datasets NLTK Setup SpaCy Setup NLP Pipeline Overview Data Scraping Text Acquisition Noise Removal Text Cleaning Embedding Pipeline Model Training Flow NLP API Deployment End-to-End Project Evaluation Metrics Pipeline Debugging Benchmarking Pipeline Project Tokenization Basics Word Tokenization Subword Tokenization Stopword Removal Stemming (Porter) Lemmatization Regex Cleaning Sentence Splitting Custom Pipelines Preprocessing Project One-Hot Encoding Bag of Words N-grams TF-IDF Theory TF-IDF with Sklearn Co-occurrence Matrix Sparse Representations CountVectorizer TF-IDF Search Engine Vectorization Project Dense Embeddings Intro CBOW Architecture Skip-gram Architecture Negative Sampling Word Analogies GloVe Embeddings FastText Embedding Visualization Text Classification Intro Naive Bayes NLP Laplace Smoothing Logistic Regression Text SVM for Text Multiclass Classification Confusion Matrix NLP Precision & Recall Sentiment Analysis Spam Detection Topic Classification Classification Project POS Tagging Intro Penn Treebank Tags HMM Theory HMM Transitions HMM Emissions Forward Algorithm Viterbi Decoding Baum-Welch Training NER with HMMs CRF for Sequence Labeling SpaCy POS Tagger Chunking & Parsing Dependency Parsing Constituency Parsing Sequence Labeling Project Similarity Problem Framing Quora Dataset EDA Cosine Similarity Jaccard Similarity Edit Distance Fuzzy Matching Feature Engineering TF-IDF Features Word2Vec Features XGBoost Classifier Model Evaluation Error Analysis Threshold Tuning Ensemble Approach BERT Embeddings Intro Sentence Transformers Deployment API Scalability Considerations Capstone Project Final Review 🎓

Ready to Start Your NLP Journey?

Begin with Module 1 — no prior NLP knowledge required. All you need is basic Python and curiosity.

Start Day 1 → 100 Days of DL All Generative AI Tutorials →