Search topics…
Tutorials
Explore
June 6 Offline Event →
Module 1 · 100 Days of NLP

Module 1: Introduction to NLP & Ambiguity

Understand Natural Language Processing foundations: core definitions, syntactic and lexical ambiguities, and practical text parsing challenges.

⏱ 20 Min Read Author: GenAIWallah Team Updated: May 2026
Day 1

What is NLP?

Why this matters

What is NLP?: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

Natural Language Processing (NLP) is the branch of AI focused on enabling computers to read, understand, and generate human language. It combines linguistics, statistics, and machine learning.

What NLP systems do

  • Understand: classify intent, extract entities, parse structure.
  • Transform: translate, summarize, rewrite, normalize text.
  • Generate: chatbots, autocomplete, report drafting.

Modern NLP spans classical pipelines (tokenization → TF-IDF → classifier) and neural models (transformers, LLMs). Both views matter for interviews and production.

Start here: NLP is not one model — it is a stack of representations and tasks built on how language works.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain what is nlp? in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does what is nlp? fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define What is NLP? and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for What is NLP?.
  3. Advanced: Compare What is NLP? to the previous topic on the same dataset.

Recap

  • You can explain what is nlp? clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: NLP Applications

Day 2

NLP Applications

Why this matters

NLP Applications: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

NLP powers products you use daily. Mapping applications to techniques helps you learn systematically.

Common application areas

  • Search & retrieval: ranking, semantic search, FAQ matching.
  • Classification: spam, sentiment, ticket routing, moderation.
  • Extraction: NER, keyphrase mining, form parsing.
  • Generation: summarization, dialogue, code assistants.
  • Speech stack: ASR (speech-to-text) + NLP downstream.

Worked example

A support desk receives: "My payment failed but money was deducted." NLP can (1) classify intent as billing, (2) extract amount/date entities, (3) route to the payments team — often with a classifier + NER pipeline.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain nlp applications in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does nlp applications fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define NLP Applications and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for NLP Applications.
  3. Advanced: Compare NLP Applications to the previous topic on the same dataset.

Recap

  • You can explain nlp applications clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Lexical Ambiguity

Day 3

Lexical Ambiguity

Why this matters

Lexical Ambiguity: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

Lexical ambiguity occurs when a single word has multiple meanings. Disambiguation usually needs context (surrounding words) or a knowledge base.

Example

"Bank" → financial institution vs river bank. "Bat" → cricket equipment vs animal.

Lexical Ambiguity Mapping
"Bank" Financial Institution River Side Edge
Production note: word-sense disambiguation is often implicit in embeddings or LLM context rather than explicit rules.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain lexical ambiguity in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does lexical ambiguity fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define Lexical Ambiguity and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for Lexical Ambiguity.
  3. Advanced: Compare Lexical Ambiguity to the previous topic on the same dataset.

Recap

  • You can explain lexical ambiguity clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Syntactic Ambiguity

Day 4

Syntactic Ambiguity

Why this matters

Syntactic Ambiguity: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

Syntactic ambiguity arises when grammar allows multiple parse trees for the same sentence.

Classic example

"I saw the man with binoculars." Did you use binoculars, or did the man have them? Attachment of the prepositional phrase changes meaning.

Why parsers matter

Dependency and constituency parsers resolve attachment. Even without full parsing, sequence models learn likely structures from data.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain syntactic ambiguity in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does syntactic ambiguity fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define Syntactic Ambiguity and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for Syntactic Ambiguity.
  3. Advanced: Compare Syntactic Ambiguity to the previous topic on the same dataset.

Recap

  • You can explain syntactic ambiguity clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Semantic Ambiguity

Day 5

Semantic Ambiguity

Why this matters

Semantic Ambiguity: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

Semantic ambiguity is meaning-level: same structure, different interpretations due to world knowledge, metaphor, or domain.

  • "He kicked the bucket" — literal vs idiomatic.
  • "Apple released a phone" — company vs fruit (needs entity linking).

Embeddings and large language models encode context-sensitive semantics; classical NLP used WordNet and ontologies.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain semantic ambiguity in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does semantic ambiguity fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define Semantic Ambiguity and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for Semantic Ambiguity.
  3. Advanced: Compare Semantic Ambiguity to the previous topic on the same dataset.

Recap

  • You can explain semantic ambiguity clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Pragmatic Analysis

Day 6

Pragmatic Analysis

Why this matters

Pragmatic Analysis: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

Pragmatics studies meaning in context: speaker intent, implicature, and discourse.

  • Coreference: "He" refers to which entity?
  • Speech acts: question vs command vs sarcasm.
  • Discourse: meaning spread across multiple utterances.
Chatbots fail pragmatically when they answer the literal question but miss user intent.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain pragmatic analysis in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does pragmatic analysis fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define Pragmatic Analysis and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for Pragmatic Analysis.
  3. Advanced: Compare Pragmatic Analysis to the previous topic on the same dataset.

Recap

  • You can explain pragmatic analysis clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: NLP Task Overview

Day 7

NLP Task Overview

Why this matters

NLP Task Overview: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

NLP is organized into task families. Recognizing them helps you pick metrics, models, and baselines.

Task map

  • Sequence labeling: POS, NER, chunking.
  • Classification: sentiment, topic, spam.
  • Structured prediction: parsing, MT alignment.
  • Similarity / retrieval: duplicate detection, semantic search.
  • Generation: summarization, dialogue (often seq2seq / LLMs).

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain nlp task overview in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does nlp task overview fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define NLP Task Overview and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for NLP Task Overview.
  3. Advanced: Compare NLP Task Overview to the previous topic on the same dataset.

Recap

  • You can explain nlp task overview clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: NLP vs CV vs RL

Day 8

NLP vs CV vs RL

Why this matters

NLP vs CV vs RL: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

NLP differs from Computer Vision (CV) and Reinforcement Learning (RL) in data type and objectives.

FieldInputTypical output
NLPDiscrete tokens, variable lengthLabels, spans, text
CVPixel gridsBoxes, masks, classes
RLState / actionsPolicy maximizing reward

Many production systems combine them (e.g., vision + captioning NLP).

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain nlp vs cv vs rl in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does nlp vs cv vs rl fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define NLP vs CV vs RL and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for NLP vs CV vs RL.
  3. Advanced: Compare NLP vs CV vs RL to the previous topic on the same dataset.

Recap

  • You can explain nlp vs cv vs rl clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Language Models Intro

Day 9

Language Models Intro

Why this matters

Language Models Intro: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

A language model assigns probability to word sequences: $P(w_1, w_2, \ldots, w_n)$. Modern LLMs are large neural LMs.

Chain rule

$$P(w_1\ldots w_n) = \prod_{i=1}^{n} P(w_i \mid w_1, \ldots, w_{i-1})$$

Classical n-gram LMs approximate history; transformers model long context with attention.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain language models intro in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does language models intro fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define Language Models Intro and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for Language Models Intro.
  3. Advanced: Compare Language Models Intro to the previous topic on the same dataset.

Recap

  • You can explain language models intro clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Corpora & Datasets

Day 10

Corpora & Datasets

Why this matters

Corpora & Datasets: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

Models learn from corpora — large text collections. Dataset choice affects bias, domain, and metrics.

  • News / web: general vocabulary, noisy HTML.
  • Reviews: sentiment tasks (IMDB, Yelp).
  • Treebanks: POS/parsing (Penn Treebank).
  • Parallel corpora: machine translation.
Always inspect label distribution, language, and license before training.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain corpora & datasets in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does corpora & datasets fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define Corpora & Datasets and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for Corpora & Datasets.
  3. Advanced: Compare Corpora & Datasets to the previous topic on the same dataset.

Recap

  • You can explain corpora & datasets clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: NLTK Setup

Day 11

NLTK Setup

Why this matters

NLTK Setup: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

NLTK (Natural Language Toolkit) is the classic Python library for teaching NLP: tokenizers, corpora, stemmers, and parsers.

Setup & first tokenization

pip install nltk

import nltk
nltk.download('punkt')
nltk.download('stopwords')

from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

text = "Sachin is a great batsman."
tokens = word_tokenize(text)
stops = set(stopwords.words('english'))
filtered = [t for t in tokens if t.lower() not in stops]
print(tokens)
print(filtered)

Download required NLTK data packs (punkt, stopwords) once per environment. Pin versions in production; prefer SpaCy for faster pipelines at scale.

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain nltk setup in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does nltk setup fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define NLTK Setup and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for NLTK Setup.
  3. Advanced: Compare NLTK Setup to the previous topic on the same dataset.

Recap

  • You can explain nltk setup clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: SpaCy Setup

Day 12

SpaCy Setup

Why this matters

SpaCy Setup: NLP foundations explain why language is ambiguous and which tasks exist before you touch models.

SpaCy provides industrial-strength tokenization, POS tagging, NER, and dependency parsing in one pipeline.

Setup & pipeline inspection

pip install spacy
python -m spacy download en_core_web_sm

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.is_stop)
  • Token attributes: .lemma_, .pos_, .is_stop.
  • Disable unused pipeline components for speed (disable=['ner']).
  • Package models as en_core_web_sm (speed) vs trf (accuracy).

Common mistakes

  • Treating NLP as only chatbots (ignoring search, extraction, classification).
  • Skipping linguistic levels (lexical vs syntactic vs semantic).
  • Assuming English-only tokenization rules apply everywhere.

Interview checkpoints

  • Q: Explain spacy setup in one minute. A: State definition, when to use it, and one failure mode.
  • Q: How does spacy setup fit in an NLP pipeline? A: Name inputs, outputs, and what breaks if this step is wrong.

Practice

  1. Basic: Define SpaCy Setup and give one real product example.
  2. Intermediate: Implement or sketch a minimal example for SpaCy Setup.
  3. Advanced: Compare SpaCy Setup to the previous topic on the same dataset.

Recap

  • You can explain spacy setup clearly.
  • You know one common mistake and how to avoid it.
  • You see how this connects to the next topic.

Next: Next module

← Back to NLP Hub Module 2: NLP Pipeline →