Module 1 · 100 Days of NLP

Module 1: Introduction to NLP & Ambiguity

Complete NLP foundations with real-world examples, Hindi analogies, code snippets, and interview questions. Every concept explained for beginners.

⏱ 45 Min Read • Author: GenAIWallah Team • Updated: June 2026

Day 1

What is Natural Language Processing (NLP)?

Why this matters

NLP is the foundation of every AI application that works with human language — ChatGPT, Google Translate, Siri, spam filters. Without understanding NLP, you cannot build or even properly use modern AI tools. This is Day 1 for a reason.

Definition: NLP in Plain English

Natural Language Processing (NLP) is a branch of Artificial Intelligence that enables computers to understand, interpret, and generate human language — the language we speak and write every day.

Think about it: computers natively understand only numbers and binary (0s and 1s). They don't "know" that "apple" is a fruit, or that "love" is an emotion, or that "Can you pass the salt?" is a request, not a question about ability. NLP is the bridge between human language and computer understanding.

☕ Chai Wala Analogy: Imagine a foreigner who doesn't speak Hindi walks into a chai shop. He needs a translator to understand "ek cutting chai dena." NLP is that translator — but between humans and computers. The computer is the foreigner; NLP translates our words into something the computer can process and act upon.

What NLP Systems Actually Do

NLP systems perform three broad categories of tasks:

Understanding (Analysis): Reading text and extracting meaning. Examples: classifying emails as spam, identifying people's names in news articles, determining if a review is positive or negative.
Transformation (Conversion): Changing text from one form to another. Examples: translating English to Hindi, converting speech to text, summarizing a 10-page article into 3 sentences.
Generation (Creation): Producing new text from scratch. Examples: ChatGPT writing an essay, Gmail suggesting the next sentence in your email, AI writing code comments.

A Brief History of NLP

NLP has evolved through three major eras:

1950s-1990s: Rule-Based NLP. Linguists wrote thousands of hand-crafted rules. "If the sentence starts with 'Can you', it's likely a request." This approach was brittle — new rules had to be written for every new situation. The famous ELIZA chatbot (1966) used simple pattern matching to simulate a therapist.
1990s-2010s: Statistical NLP. Instead of rules, researchers used probability and statistics. "How likely is 'bank' to mean 'financial institution' when the previous word is 'river'?" N-gram models, Hidden Markov Models, and Naive Bayes classifiers dominated this era.
2010s-Present: Deep Learning & LLMs. Neural networks, especially the Transformer architecture (2017), revolutionized NLP. Models like BERT, GPT-4, and Claude read billions of text pages and learn patterns automatically. This is the era of ChatGPT and modern AI assistants.

Worked Example: Spam Detection

Consider this email: "Congratulations! You have won a $1,000,000 prize. Click here to claim."

An NLP spam filter would:

1. Tokenize the text into words: ["Congratulations", "You", "have", "won", ...]

2. Check for spam indicators: words like "prize", "click here", "$1,000,000" have high spam probability

3. Calculate overall probability using a trained model that has seen millions of spam and non-spam emails

4. Classify as spam if probability > 0.95 (threshold)

5. Move to spam folder automatically

Every step here uses NLP techniques we will learn in this course.

        Key Insight: NLP is not one algorithm or one model. It's an entire pipeline of techniques — from text cleaning to tokenization to vectorization to machine learning models. Each module in this course covers one critical piece of that pipeline.
      

Common Mistakes Beginners Make

Thinking NLP = only chatbots: NLP powers search engines, spam filters, translation, voice assistants, and even medical diagnosis from doctor's notes. Don't limit your understanding.
Ignoring linguistics: You don't need to be a linguist, but understanding basic concepts like ambiguity, syntax, and semantics makes you a much better NLP engineer.
Expecting English rules to work for Hindi: Hindi has no spaces between words in some scripts, different word orders, and complex morphological inflections. Tokenization for Hindi is fundamentally different from English.
Jumping to Transformers without basics: Many beginners try to use BERT or GPT without understanding tokenization or vectorization. This leads to cargo-cult coding — copying code without understanding why it works.

Interview Checkpoints

Q: What is NLP and how is it different from regular programming?
A: NLP deals with unstructured human language, whereas regular programming deals with structured data and explicit rules. NLP uses statistical and machine learning methods to handle the ambiguity and variability of natural language.
Q: Name 5 real-world applications of NLP.
A: Search engines, spam detection, machine translation, sentiment analysis, voice assistants, chatbots, medical text extraction, automatic summarization.
Q: Why is NLP considered harder than Computer Vision?
A: Language is highly ambiguous — the same word can mean different things. Images have fixed pixel meanings; words gain meaning from context. Also, there are 7,000+ languages with different grammar rules.

Hands-On Practice

Basic: Write down 5 apps you use daily that involve NLP. For each, identify what NLP task is being performed (search, translation, classification, etc.).
Intermediate: Open your email inbox. Manually classify 10 emails as spam or not-spam. Write down which words/phrases made you decide. This is exactly what a Naive Bayes classifier does!
Advanced: Research the "Turing Test" (1950). Explain how modern NLP systems like GPT-4 relate to Turing's original vision. How close are we to passing the test?

Day 1 Recap

NLP = teaching computers to understand human language
Three types of NLP tasks: Understanding, Transformation, Generation
Three eras: Rule-based → Statistical → Deep Learning/LLMs
NLP is a pipeline, not a single algorithm

Next: Day 2 — NLP Applications in the Real World

Day 2

NLP Applications in the Real World

Why this matters

Knowing the applications helps you understand WHY you are learning each technique. When you know that TF-IDF powers Google Search, you learn it with purpose. When you know that sentiment analysis drives stock trading, you pay attention to precision and recall.

NLP is not an academic exercise — it powers products that billions of people use every single day. Let's explore the major application areas with specific examples and the companies behind them.

1. Search & Information Retrieval

Every time you search on Google, Bing, or DuckDuckGo, NLP is working behind the scenes. Modern search engines don't just match keywords — they understand your intent.

Example: You search for "apple release date." The search engine knows you mean the company Apple, not the fruit, because "release date" is semantically associated with products, not produce. This is entity disambiguation using NLP.

Key Techniques: TF-IDF, BM25, Word2Vec, BERT embeddings, semantic search, query expansion.

Companies: Google, Bing, Elastic, Algolia.

2. Text Classification

Assigning predefined categories to text. This is one of the most common and commercially valuable NLP tasks.

Spam Detection: Gmail processes 300 billion emails daily. Its spam filter uses NLP to classify emails with >99.9% accuracy.
Sentiment Analysis: Companies like Zomato and Swiggy analyze customer reviews to understand satisfaction. Stock traders analyze Twitter sentiment to predict market movements.
Topic Classification: News aggregators like Google News automatically categorize articles into "Politics," "Sports," "Technology."
Support Ticket Routing: When you email customer support, NLP automatically routes your ticket to the right department (billing, technical, returns).

Key Techniques: Naive Bayes, Logistic Regression, SVM, LSTM, BERT fine-tuning.

3. Named Entity Recognition (NER) & Information Extraction

Identifying and extracting specific information from text — names, dates, locations, organizations, amounts.

Example: Consider this news headline: "Apple CEO Tim Cook announced a new $1 billion factory in Bangalore on January 15, 2026." An NER system would extract:

Organization: Apple
Person: Tim Cook
Title: CEO
Money: $1 billion
Location: Bangalore
Date: January 15, 2026

Real-World Use: Banks extract transaction details from emails. Hospitals extract diagnoses and medications from doctor's notes. Law firms extract case details from legal documents.

4. Machine Translation

Converting text from one language to another. This is what powers Google Translate, DeepL, and Microsoft Translator.

Evolution: Early systems used rule-based translation (word-by-word with grammar rules). Statistical systems (2010s) used parallel corpora. Modern Neural Machine Translation (NMT) uses sequence-to-sequence models with attention, producing fluent, natural-sounding translations.

Key Challenge: Languages are not one-to-one. Hindi "जा रहा हूँ" has no direct English equivalent — it combines "go + ing + I am." The model must learn these structural differences.

5. Text Summarization

Condensing long documents into shorter versions while preserving key information.

Extractive: Selecting the most important sentences from the original text. Fast but less natural.
Abstractive: Generating new sentences that capture the meaning. Uses LLMs like GPT. More natural but can hallucinate facts.

Example: TLDR; browser extensions use abstractive summarization to summarize web articles. News services like Inshorts use extractive summarization for quick reads.

6. Question Answering & Chatbots

Systems that answer questions based on documents or general knowledge.

Types:

Closed-domain QA: Answers from a specific document (e.g., "What is the refund policy?" answered from a company's FAQ page)
Open-domain QA: Answers from general knowledge (e.g., "Who wrote the Indian Constitution?")
Conversational AI: ChatGPT, Claude, Gemini — can maintain context across multiple turns of conversation

7. Speech Recognition & Synthesis

Converting speech to text (ASR) and text to speech (TTS). Siri, Alexa, Google Assistant all use NLP pipelines that start with speech recognition.

Key Challenge: Accents, background noise, homophones ("write" vs "right"), and code-switching (Hindi + English in the same sentence, common in India).

Case Study: How Zomato Uses NLP

Zomato receives millions of food reviews daily. Their NLP pipeline does the following:

1. Sentiment Analysis: Classify each review as positive, negative, or neutral

2. Aspect Extraction: Identify what the review is about — food quality, delivery time, packaging, value for money

3. Named Entity Recognition: Extract restaurant names, dish names, locations

4. Summarization: Generate a 2-line summary of 50 reviews for the restaurant owner

5. Response Generation: Auto-generate replies to common complaints ("We apologize for the late delivery. Here is a 20% off coupon.")

This is a full NLP pipeline in production — exactly what you'll learn to build in this course.

Common Mistakes

Assuming one model solves everything: Real-world NLP uses pipelines. A chatbot uses ASR → intent classification → entity extraction → response generation. Each step is a different model.
Ignoring domain-specific challenges: Medical NLP needs to handle abbreviations and drug names. Legal NLP needs to handle archaic language. Twitter NLP needs to handle emojis, slang, and typos.
Thinking translation is just word substitution: Good translation requires understanding cultural context, idioms, and grammatical structures. "It's raining cats and dogs" cannot be translated literally to Hindi.

Interview Checkpoints

Q: What is the difference between extractive and abstractive summarization?
A: Extractive summarization selects existing sentences from the source text. Abstractive summarization generates new sentences that capture the meaning, often using LLMs. Extractive is faster and more factual; abstractive is more natural but can introduce hallucinations.
Q: How does Google Search understand queries beyond keywords?
A: Google uses BERT to understand query context and relationships between words. It also uses entity recognition, knowledge graphs, and semantic embeddings to match intent even when keywords differ.
Q: Design an NLP pipeline for a customer support chatbot.
A: ASR (speech-to-text) → Intent Classification (what does the user want?) → Entity Extraction (order ID, product name) → Knowledge Base Retrieval (find relevant FAQ) → Response Generation → TTS (text-to-speech). Include fallback to human agent when confidence is low.

Hands-On Practice

Basic: Pick 3 apps on your phone. For each, list which NLP techniques they likely use. Example: WhatsApp → spell correction, auto-complete, emoji suggestions.
Intermediate: Go to Google News. Look at 5 articles. For each, identify which NLP tasks were performed to get it onto Google News (topic classification, NER, summarization, duplicate detection).
Advanced: Research "RAG" (Retrieval-Augmented Generation). Explain how it combines search and LLM generation. Why is it becoming the industry standard for enterprise chatbots?

Day 2 Recap

7 major NLP application areas: Search, Classification, NER, Translation, Summarization, QA/Chatbots, Speech
Real-world NLP is always a pipeline of multiple techniques
Domain-specific challenges make NLP hard (medical, legal, social media)

Next: Day 3 — Lexical Ambiguity: Why Computers Get Confused by Words

Day 3

Lexical Ambiguity: When Words Have Multiple Meanings

Why this matters

Lexical ambiguity is the single biggest reason NLP is hard. A human knows that "bank" means "financial institution" in "I went to the bank to deposit money," but means "river edge" in "The fish swam near the bank." Computers have no such intuition unless we teach them.

What is Lexical Ambiguity?

Lexical ambiguity occurs when a single word (or "lexeme") has more than one meaning. In linguistics, these different meanings are called senses or word senses.

According to the Oxford English Dictionary, the average English word has about 2.5 meanings. Some words have dozens! The word "run" has over 600 senses in English.

Types of Lexical Ambiguity

Homonymy: Words that share the same spelling and pronunciation but have completely unrelated meanings. "Bank" (financial institution vs. river edge) and "Bat" (cricket equipment vs. flying mammal).
Polysemy: Words with related but distinct meanings. "Foot" can mean the body part, the bottom of a mountain, or a unit of measurement (12 inches). These senses are conceptually related.
Homophony: Words that sound the same but are spelled differently. "Right" (correct) vs. "Write" (put words on paper) vs. "Rite" (ceremony). Speech recognition systems must handle this.

☕ Chai Wala Analogy: "Cutting" in Hindi can mean "cutting chai" (half a glass), "cutting" a line, or "cutting" a deal. If a foreigner hears "cutting karo," they won't know which meaning unless they understand the context. Computers face the same problem — they need context to disambiguate.

Real Examples That Break NLP Systems

Word	Sense 1	Sense 2	Disambiguating Context
Bank	Financial institution	River edge	"deposit money" → financial; "swam near" → river
Date	Fruit	Calendar day	"ate a" → fruit; "set a" → calendar
Book	Reading material	To reserve	"read a" → material; "a flight" → reserve
Light	Not heavy	Illumination	"bag is" → not heavy; "turn on" → illumination
Match	Sports game	Fire starter	"won the" → sports; "strike a" → fire
Spring	Season	Metal coil	"in the" → season; "broken" → coil
Bar	Drinking place	Metal rod	"at the" → drinking; "behind" → rod

How Do Humans Disambiguate?

Humans use several cues automatically:

Contextual words: If you hear "bank" near "money," "deposit," or "account," you think financial. Near "river," "water," or "fishing," you think river edge.
World knowledge: You know fish don't live in financial buildings. You know people don't swim in banks.
Grammatical role: "Book" as a verb ("Book a table") vs. noun ("Read a book") has different grammatical positions.

How Do Computers Disambiguate?

There are several computational approaches:

Word Sense Disambiguation (WSD): Using machine learning to classify which sense is intended. Features include surrounding words, part-of-speech, and word embeddings.
Contextual Embeddings (Modern): BERT and GPT models generate different vector representations for the same word depending on context. "Bank" in "river bank" gets a different embedding than "bank" in "bank account." This is the state-of-the-art approach.
Knowledge Bases (Classical): WordNet is a lexical database that organizes words into synsets (sets of synonyms). "Bank" has separate synsets for financial institution and river edge.
Supervised Learning: Train a classifier on labeled examples where each occurrence of "bank" is tagged with the correct sense. Features: surrounding words, POS tags, etc.

Worked Example: Word Sense Disambiguation

Sentence: "The bank was steep and slippery after the rain."

Step 1: Identify the ambiguous word = "bank"

Step 2: Extract context window = 3 words before and after: ["The", "bank", "was", "steep", "and", "slippery", "after", "the", "rain"]

Step 3: Look for disambiguating words: "steep" and "slippery" are physical properties of terrain, not financial institutions. "rain" is weather-related.

Step 4: Using a WordNet-based approach, the "river edge" sense of "bank" is associated with words like "steep," "river," "water" — so we select this sense.

Step 5: Using a BERT-based approach, the contextual embedding of "bank" in this sentence is closer to "river bank" in the vector space than to "financial bank."

Key Insight: In modern NLP, we rarely do explicit Word Sense Disambiguation. Instead, contextual embeddings (like BERT) handle ambiguity implicitly. The model learns that "bank" in "river bank" and "bank" in "bank account" are different points in vector space. This is why LLMs are so powerful — they don't need explicit rules.

Common Mistakes

Thinking all ambiguity is bad: In some NLP tasks, ambiguity doesn't matter. For spam detection, you don't need to know if "bank" means river or finance — both are likely in spam emails. The right level of analysis depends on the task.
Using simple dictionary lookup: Just looking up a word in a dictionary doesn't solve ambiguity. Context is everything. A dictionary tells you the POSSIBLE meanings, not the ACTUAL meaning.
Ignoring proper nouns: "Apple" as a company vs. "apple" as a fruit is a special case of lexical ambiguity. NER systems handle this by tagging entities.

Interview Checkpoints

Q: What is lexical ambiguity and how does it affect NLP systems?
A: Lexical ambiguity is when a word has multiple meanings. It affects NLP because models must use context to determine the intended meaning. Without disambiguation, a search for "bank" might return both financial and river-related results.
Q: What is the difference between homonymy and polysemy?
A: Homonymy: unrelated meanings ("bank" as financial vs. river). Polysemy: related meanings ("foot" as body part vs. unit of measurement). Homonyms are harder to disambiguate because the senses have no semantic connection.
Q: How does BERT handle lexical ambiguity without explicit WSD?
A: BERT generates contextual embeddings — the vector representation of "bank" changes based on surrounding words. In "river bank," the embedding is closer to "water" and "shore." In "bank account," it's closer to "money" and "deposit." The model learns this from training data.

Hands-On Practice

Basic: Find 10 ambiguous words in English. For each, write 2 sentences showing different meanings. Example: "Match" — "The cricket match was exciting" vs. "He struck a match to light the candle."
Intermediate: For each sentence you wrote, identify which words in the context help disambiguate the meaning. These are the "disambiguating features" that ML models use.
Advanced: Try the online WordNet browser (wordnet.princeton.edu). Look up "bank" and explore its synsets. How many senses does it have? Which are related (polysemy) vs. unrelated (homonymy)?

Day 3 Recap

Lexical ambiguity = one word, multiple meanings
Homonymy (unrelated) vs. Polysemy (related) vs. Homophony (same sound)
Humans disambiguate using context, world knowledge, and grammar
Computers use WSD, WordNet, or contextual embeddings (BERT/GPT)
Modern LLMs handle ambiguity implicitly through embeddings

Next: Day 4 — Syntactic Ambiguity: When Grammar Creates Confusion

Day 4

Syntactic Ambiguity: When Grammar Creates Multiple Meanings

Why this matters

Syntactic ambiguity is why parsers and translation systems often fail. If a sentence can be parsed in two ways, the NLP system must choose the "correct" parse. Wrong choice = wrong meaning. This is a core interview topic for NLP roles.

What is Syntactic Ambiguity?

Syntactic ambiguity (also called structural ambiguity) occurs when a sentence has more than one valid grammatical structure, leading to different interpretations. Unlike lexical ambiguity (where a single word is ambiguous), syntactic ambiguity involves the entire sentence structure.

Classic Examples

Example 1: Prepositional Phrase Attachment

"I saw the man with binoculars."

Two possible parses:

Parse A: I used binoculars to see the man. ["with binoculars" modifies "saw"]
Parse B: I saw a man who had binoculars. ["with binoculars" modifies "man"]

This is the most common type of syntactic ambiguity — deciding which word a prepositional phrase attaches to.

Example 2: Coordination Ambiguity

"Old men and women sat on the bench."

Parse A: [Old men] and [women] sat on the bench. (Only the men are old)
Parse B: Old [men and women] sat on the bench. (Both are old)

Example 3: Scope Ambiguity

"Everyone loves someone."

Reading A: For every person, there exists someone they love. (Everyone loves at least one person — possibly different people)
Reading B: There exists one person who is loved by everyone. (One universal beloved — like a celebrity)

Example 4: VP Attachment

"Visiting relatives can be boring."

Reading A: [Visiting relatives] can be boring. (The act of visiting relatives is boring — "visiting" is a gerund/noun)
Reading B: Visiting [relatives can be boring]. (Relatives who visit can be boring — "visiting" is a participle/adjective)

☕ Chai Wala Analogy: "Bhaiya cutting chai do." Does "cutting" describe the chai (half a glass) or the action (the bhaiya is cutting something while making chai)? The grammar allows both, but context tells us the first meaning. Syntactic ambiguity is when the grammar alone doesn't tell us the meaning — context is needed.

Parse Trees: Visualizing Syntactic Structure

A parse tree (or syntax tree) represents the grammatical structure of a sentence. When a sentence is syntactically ambiguous, it has multiple valid parse trees.

Parse Tree: "I saw the man with binoculars"

Types of Syntactic Ambiguity

Type	Description	Example
PP Attachment	Prepositional phrase can attach to different words	"I saw the man with binoculars"
Coordination	Unclear what adjectives modify	"Old men and women"
Scope	Quantifiers have different scopes	"Everyone loves someone"
Gerund/Participle	Word can be noun or adjective	"Visiting relatives can be boring"
Ellipsis	Missing words create multiple readings	"I like tea more than coffee [than I like coffee / than coffee likes tea]"

How NLP Systems Handle Syntactic Ambiguity

Probabilistic Context-Free Grammars (PCFG): Assign probabilities to each grammar rule. The parse with the highest probability is chosen. "The man with binoculars" is more likely to modify "man" than "saw" because people carrying binoculars is common.
Dependency Parsing: Instead of full parse trees, identify direct relationships between words (who did what to whom). Modern parsers like SpaCy's use neural networks to predict dependencies.
Neural Parsing (Modern): Transformers like BERT can implicitly resolve syntactic ambiguity through contextual understanding. The model learns that "with binoculars" modifying "man" is more likely in certain contexts.

Worked Example: Resolving PP Attachment

Sentence: "I saw the man with the telescope."

Classical approach: Use a PCFG parser. Calculate P(parse A) vs P(parse B) based on training corpus statistics. If the corpus shows that "with" more often attaches to nouns than verbs, choose Parse B.

Modern approach: Feed the sentence to BERT. The contextual embedding of "with" captures the relationship with surrounding words. If the next sentence is "I focused the lens," the model infers that "I" used the telescope. If the next sentence is "He was birdwatching," the model infers that "the man" had the telescope.

Key difference: Classical methods rely on grammar rules and statistics. Modern methods learn from millions of examples and use context to infer the most likely meaning.

Common Mistakes

Assuming one parse is always correct: In real-world text, some sentences are genuinely ambiguous even to humans. NLP systems should flag uncertainty rather than confidently pick the wrong parse.
Ignoring semantic plausibility: A grammar-based parser might produce a valid parse that is semantically impossible. "The dog ate the homework" is grammatically valid; "The homework ate the dog" is also grammatically valid but semantically absurd. Good parsers incorporate semantic knowledge.
Not handling non-standard grammar: Social media text ("u saw the man wit da telescope lol") breaks formal parsers. Preprocessing is essential.

Interview Checkpoints

Q: What is the difference between lexical and syntactic ambiguity?
A: Lexical ambiguity: a single word has multiple meanings ("bank"). Syntactic ambiguity: the sentence structure allows multiple interpretations ("I saw the man with binoculars"). Lexical is about words; syntactic is about grammar.
Q: How do dependency parsers handle syntactic ambiguity?
A: Dependency parsers predict directed relationships between words. For "saw the man with binoculars," the parser must decide whether "with" connects to "saw" or "man." Neural parsers use word embeddings and neural networks to score each possible attachment and choose the highest-scoring one.
Q: Give 5 examples of syntactically ambiguous sentences.
A: (1) "I saw the man with binoculars" (PP attachment), (2) "Old men and women" (coordination), (3) "Visiting relatives can be boring" (gerund vs participle), (4) "The chicken is ready to eat" (who eats?), (5) "Flying planes can be dangerous" (gerund vs participle).

Hands-On Practice

Basic: Write 5 syntactically ambiguous sentences. For each, explain the two possible parses. Try them on friends — see if they instinctively choose one meaning over the other.
Intermediate: Draw parse trees for your ambiguous sentences. Show how the tree structure changes between the two interpretations. This is exactly what NLP parsers do internally.
Advanced: Use SpaCy's dependency parser to parse "I saw the man with binoculars." Look at the dependency labels. Does it attach "with" to "saw" or "man"? Try different sentences and observe how the parser resolves ambiguity.

Day 4 Recap

Syntactic ambiguity = multiple grammatical structures for the same sentence
5 types: PP attachment, coordination, scope, gerund/participle, ellipsis
Parse trees visualize grammatical structure; ambiguous sentences have multiple trees
Classical solution: PCFG parsers with probabilities
Modern solution: Neural dependency parsers and Transformer context

Next: Day 5 — Semantic Ambiguity: When Meaning Depends on World Knowledge

Day 5

Semantic Ambiguity: When Meaning Depends on World Knowledge

Why this matters

Semantic ambiguity is why chatbots fail at sarcasm, why translation systems mistranslate idioms, and why LLMs sometimes hallucinate. It requires understanding not just language, but the world.

What is Semantic Ambiguity?

Semantic ambiguity occurs when a sentence has a clear grammatical structure, but its meaning is unclear because it requires knowledge about the world, context, or the speaker's intent.

Unlike lexical ambiguity (ambiguous word) or syntactic ambiguity (ambiguous grammar), semantic ambiguity is about meaning — and meaning is deeply tied to what we know about the world.

Types of Semantic Ambiguity

1. Idioms and Figurative Language

Idioms cannot be understood by analyzing the individual words. The meaning is conventional — you must know the idiom as a whole.

"He kicked the bucket." → He died (not that he literally kicked a bucket)
"It's raining cats and dogs." → It's raining heavily (not literal animals)
"Break a leg!" → Good luck (especially in theatre)
"The ball is in your court." → It's your turn to act

Challenge for NLP: A literal NLP system might try to find "bucket" in the "kick the bucket" sentence, or look for falling animals in "raining cats and dogs." Only systems trained on idiomatic expressions can handle this.

2. Metaphor and Analogy

"Time is money." → Time is valuable and should be spent wisely. This is not a literal statement about time being a currency.

"The CEO is a shark." → The CEO is aggressive in business, not that they are a literal marine animal.

Challenge for NLP: Metaphor detection requires understanding that "shark" in a business context has a different meaning from "shark" in an ocean context. LLMs handle this reasonably well because they've seen millions of metaphorical usages in training data.

3. Sarcasm and Irony

Sarcasm is when the literal meaning is the opposite of the intended meaning. This is one of the hardest problems in NLP.

"Great job missing the deadline." → The speaker is NOT praising the person. They are criticizing them.
"Oh, I LOVE waiting in traffic for 2 hours." → The speaker hates traffic.
"Yeah, that's exactly what I needed today." (after spilling coffee on laptop) → The speaker is frustrated, not satisfied.

Why it's hard: Sarcasm requires understanding the speaker's intent, emotional state, and the situational context. Even humans sometimes miss sarcasm in text (without tone of voice, it's harder). State-of-the-art sarcasm detection systems achieve only ~70-80% accuracy.

4. Presupposition and Entailment

Some sentences assume background knowledge that isn't explicitly stated.

"John stopped smoking." → Presupposes that John used to smoke. If he never smoked, the sentence is odd.
"Have you stopped beating your dog?" → This is a classic loaded question. It presupposes you were beating your dog.

5. Entity Ambiguity (Named Entity Disambiguation)

"Apple released a new product." → "Apple" could be the company (Apple Inc.) or the fruit. In a tech news context, it's the company. In a cooking blog, it might be the fruit.

☕ Chai Wala Analogy: "Bhaiya, ye chai aag lagi hai!" A literal NLP system might think "aag lagi hai" means the tea is literally on fire. But in Indian slang, it means the tea is amazing. Semantic ambiguity requires cultural knowledge — not just linguistic knowledge. A chai wala in Delhi knows this; a computer trained only on British English might not.

How NLP Systems Handle Semantic Ambiguity

WordNet and Knowledge Graphs: WordNet groups words by meaning. "Bank" as a financial institution is in a different synset from "bank" as a river edge. Knowledge graphs like Wikidata encode relationships between entities.
Contextual Embeddings (BERT, GPT): These models learn semantic meaning from massive training data. "Apple" in "Apple released iPhone" gets a different embedding than "Apple" in "Apple pie recipe." The model learns from seeing millions of examples.
Sentiment Analysis as a Proxy: For sarcasm detection, some systems check if the literal sentiment (positive) contradicts the expected sentiment (negative given context). If the text is positive but the situation is clearly negative, flag as sarcasm.
Multi-Modal Approaches: Sarcasm is easier to detect when you have audio (tone of voice) or video (facial expressions). Text-only sarcasm detection is much harder.

Worked Example: Sarcasm Detection

Tweet: "Just what I needed — another 2-hour commute in the rain. My day is made! #blessed"

Step 1: Literal sentiment analysis: "blessed," "my day is made" → positive words

Step 2: Context analysis: "2-hour commute," "rain" → negative situation

Step 3: Detect contradiction: Positive words + negative situation → likely sarcasm

Step 4: Punctuation clues: "Just what I needed" followed by em-dash often indicates frustration

Step 5: Final classification: Sarcastic (negative sentiment, despite positive words)

Common Mistakes

Treating all text as literal: Beginners often build sentiment analyzers that only look at word polarity. They miss sarcasm, irony, and idioms entirely. A good NLP system needs context-aware models.
Ignoring cultural context: "Chai pe charcha" is political discussion in India. "Chai" in Britain means tea with milk. "Chai" in the US might refer to a spicy latte. Same word, different cultural meanings.
Assuming WordNet is sufficient: WordNet is great for formal English but misses slang, regionalisms, and new words. It doesn't know what "rizz" or "sigma" mean. It was last updated in 2011 and misses everything from internet culture.

Interview Checkpoints

Q: What is semantic ambiguity and how is it different from lexical and syntactic ambiguity?
A: Semantic ambiguity is when the meaning is unclear despite clear words and grammar. It requires world knowledge to resolve. Lexical ambiguity is about words with multiple meanings; syntactic is about multiple grammatical structures. Semantic is about the gap between what is said and what is meant.
Q: Why is sarcasm detection so difficult for NLP systems?
A: Sarcasm requires understanding intent, tone, and context. Text-only systems lack audio cues (tone of voice) and visual cues (facial expressions). Sarcasm often involves saying the opposite of what you mean, which contradicts literal sentiment analysis. Current systems achieve only ~70-80% accuracy on sarcasm benchmarks.
Q: How do LLMs handle idioms like "kick the bucket"?
A: LLMs learn idioms from training data. During training, they see "kick the bucket" in contexts where people are talking about death. The model learns the association between the phrase and the meaning. However, they can still fail with rare idioms or idioms from underrepresented languages.

Hands-On Practice

Basic: Find 10 English idioms. For each, write the literal meaning and the actual meaning. Try translating them word-by-word into Hindi — does the meaning survive? This is exactly why machine translation is hard.
Intermediate: Collect 10 sarcastic tweets or social media posts. For each, write the literal sentiment (positive/negative) and the intended sentiment. Notice how often they contradict. This is the core challenge of sarcasm detection.
Advanced: Use WordNet (wordnet.princeton.edu or NLTK's WordNet) to look up "break." How many senses does it have? Which senses are related (polysemy) and which are completely separate (homonymy)?

Day 5 Recap

Semantic ambiguity = meaning is unclear even with clear words and grammar
5 types: Idioms, Metaphor, Sarcasm, Presupposition, Entity ambiguity
Requires world knowledge, cultural context, and intent understanding
Sarcasm is the hardest NLP problem — even humans struggle with text-only sarcasm
LLMs handle this through massive training data but still fail with rare cases

Next: Day 6 — Pragmatic Analysis: Context, Intent, and Discourse

Day 6

Pragmatic Analysis: Context, Intent, and Discourse

Why this matters

Pragmatics is where NLP gets really hard — and where the most valuable applications live. Chatbots, virtual assistants, and enterprise AI agents all need to understand not just what the user said, but what the user MEANT. This is the frontier of modern NLP.

What is Pragmatics?

Pragmatics is the study of how context contributes to meaning. It's not about what words mean in a dictionary (that's semantics), or how they're arranged grammatically (that's syntax). It's about what speakers INTEND to communicate in specific situations.

In other words: semantics tells you what a sentence means; pragmatics tells you what the speaker means by it.

The Four Levels of NLP Analysis

To understand pragmatics, you need to see where it fits in the broader NLP stack:

Level	Focus	Question Answered	Example
Lexical	Words	What does each word mean?	"Bank" = financial institution or river edge
Syntactic	Grammar	How are words arranged?	"I saw the man with binoculars" — who had the binoculars?
Semantic	Meaning	What does the sentence mean literally?	"He kicked the bucket" = he struck a container with his foot
Pragmatic	Context & Intent	What did the speaker intend?	"He kicked the bucket" = he died (euphemism)

Key Concepts in Pragmatics

1. Speech Acts (What the speaker is DOING)

Every utterance is an action. Linguist J.L. Austin identified three types:

Locutionary act: The literal meaning. "Can you pass the salt?" = asking about ability to pass salt.
Illocutionary act: The intended action. "Can you pass the salt?" = REQUESTING salt (not asking about ability).
Perlocutionary act: The actual effect. The listener passes the salt.

Why this matters for NLP: A chatbot that answers "Yes, I can pass the salt" to "Can you pass the salt?" has failed at the illocutionary level. It understood the words but missed the intent.

2. Conversational Implicature (Reading between the lines)

Philosopher H.P. Grice proposed that speakers follow a "Cooperative Principle" — they make contributions that are:

Relevant: On-topic
Informative: Saying enough, but not too much
Truthful: Not saying false things
Clear: Not obscure or ambiguous

When speakers violate these maxims, they create implicature:

"Your code is... interesting." → If the speaker says this with a weird tone, they might mean "Your code is terrible" but are being polite. The implicature is the opposite of the literal meaning.

3. Anaphora Resolution (What does "it" refer to?)

Pronouns and references require tracking across sentences:

"Rahul told Arjun that he failed the exam." → Who failed? Rahul or Arjun? This is anaphora resolution, and it's a core NLP task.

"The dog chased the cat. It barked." → "It" refers to the dog (the cat meows, doesn't bark). World knowledge helps resolve this.

4. Discourse Coherence (How sentences connect)

Individual sentences make sense, but discourse requires understanding how they relate:

"John loves movies. He went to the cinema." → Cause-effect? The second sentence explains what John did because he loves movies.

"John loves movies. He hates popcorn." → Contrast? These are independent facts about John.

☕ Chai Wala Analogy: "Bhaiya, do you have any special chai?" A literal NLP system might answer "Yes, we have chai." A pragmatic system would understand: "The customer wants to know what's special." It should answer: "We have masala chai, adrak chai, and elaichi chai. Masala is our bestseller." This is pragmatics — understanding what the customer really wants, not just what they asked.

How Modern NLP Handles Pragmatics

Intent Classification: In chatbots, the first step is classifying user intent: "book_flight," "check_status," "complaint," "greeting." This is a pragmatic task.
Coreference Resolution: Specialized models (like SpanBERT) identify what pronouns and references point to. Essential for document understanding and question answering.
Dialogue State Tracking: In multi-turn conversations, track what the user has said, what they want, and what information has been provided. "What time?" — the system must remember they were booking a flight.
Context Windows (LLMs): GPT-4, Claude, and Gemini maintain context across thousands of tokens. This lets them resolve anaphora, track discourse, and infer intent across long conversations.

Worked Example: Intent Classification in a Chatbot

User: "I need to get to Delhi by tomorrow morning. What's the earliest flight?"

Step 1 (Intent Classification): The system classifies intent as "book_flight" with confidence 0.95.

Step 2 (Entity Extraction): Extract destination="Delhi", date="tomorrow morning", preference="earliest".

Step 3 (Dialogue State): Missing: origin city. System must ask: "Where are you flying from?"

Step 4 (Pragmatic Inference): "earliest flight" implies business traveler, possibly willing to pay more. System could suggest premium options.

Step 5 (Response Generation): "I can help you find the earliest flight to Delhi tomorrow morning. Where will you be departing from?"

Each step is a pragmatic NLP task. Without pragmatics, the chatbot would just search for literal matches and fail.

Common Mistakes

Treating every question as a literal question: "Can you pass the salt?" is a request, not a question about ability. Chatbots that answer "Yes, I am capable of passing salt" are failing at pragmatics.
Ignoring conversation history: In multi-turn dialogue, each utterance depends on previous ones. "What about tomorrow?" only makes sense if you know what the user is asking about.
Not handling politeness and indirectness: Indian English often uses indirect requests: "If you could just help me with this..." This is a request, not a conditional statement about ability.
Building keyword-based bots: Old chatbots matched keywords ("flight" → flight booking). Modern systems use intent classification and entity extraction to understand the full pragmatic meaning.

Interview Checkpoints

Q: What is the difference between semantics and pragmatics in NLP?
A: Semantics is about literal meaning — what the words and sentences mean. Pragmatics is about intended meaning — what the speaker means in context, including speech acts, implicature, and discourse. "Can you pass the salt?" semantically asks about ability; pragmatically, it's a request.
Q: What is anaphora resolution and why is it important?
A: Anaphora resolution is determining what pronouns and references point to. In "The dog chased the cat. It barked," "it" refers to the dog. This is essential for document understanding, question answering, and summarization. Models like SpanBERT and coreference resolution systems handle this.
Q: How do LLMs handle pragmatic tasks like intent classification?
A: LLMs are trained on massive conversational data and learn to infer intent from context. They can be fine-tuned on intent classification datasets. However, they can still fail with subtle pragmatic cues, sarcasm, and culturally-specific indirectness. For production chatbots, explicit intent classification layers are often combined with LLM generation.

Hands-On Practice

Basic: For each sentence below, identify the illocutionary act (what the speaker is doing): (a) "Can you open the door?" (b) "It's cold in here." (c) "I suppose you could help me." (d) "Do you have the time?"
Intermediate: Find 5 examples of anaphora in a news article. For each pronoun (he, she, it, they, this, that), identify what it refers to. Notice how often you need world knowledge to resolve it.
Advanced: Research "BERT for coreference resolution" (SpanBERT). Explain how it works at a high level. What are the challenges it still faces?

Day 6 Recap

Pragmatics = study of context, intent, and what speakers mean beyond literal words
Speech acts: locutionary (literal), illocutionary (intended), perlocutionary (effect)
Grice's maxims: relevance, informativeness, truthfulness, clarity
Anaphora resolution: tracking what pronouns refer to across sentences
Modern NLP: intent classification, coreference resolution, dialogue state tracking, LLM context windows

Next: Day 7 — NLP Task Overview: A Taxonomy of Language Tasks

Day 7

NLP Task Overview: A Complete Taxonomy

Why this matters

Every NLP problem you encounter in your career will fit into one of these task categories. Knowing the taxonomy helps you quickly identify the right tools, models, and evaluation metrics for any new project.

The NLP Task Hierarchy

NLP tasks are organized by level of linguistic analysis and by output type. Here's the complete taxonomy:

1. Word-Level Tasks

Tasks that analyze individual words:

Tokenization: Splitting text into words, subwords, or characters. The foundation of everything.
Stemming: Reducing words to their root form ("running" → "run"). Crude but fast.
Lemmatization: Reducing to dictionary form ("better" → "good"). Uses POS tags and dictionaries.
Part-of-Speech Tagging (POS): Labeling each word with its grammatical role (noun, verb, adjective).
Morphological Analysis: Analyzing word structure — prefixes, suffixes, roots ("unhappiness" = un + happy + ness).
Word Sense Disambiguation (WSD): Determining which meaning of a word is intended ("bank" = financial or river).

2. Sequence-Level Tasks

Tasks that analyze sequences of words:

Named Entity Recognition (NER): Identifying entities like people, organizations, locations, dates. "Barack Obama visited Paris" → PER: Barack Obama, LOC: Paris.
Chunking / Shallow Parsing: Grouping words into phrases (noun phrases, verb phrases). "[The quick brown fox] [jumped] [over the lazy dog]."
Dependency Parsing: Finding grammatical relationships between words. "fox" → subject of "jumped"; "dog" → object of "over."
Constituency Parsing: Building full parse trees showing nested phrase structure.

3. Document-Level Tasks

Tasks that analyze entire documents or sentences:

Text Classification: Assigning a category. Sentiment analysis (positive/negative), spam detection, topic classification.
Text Similarity: Measuring how similar two texts are. Cosine similarity, semantic similarity, paraphrase detection.
Text Summarization: Extractive (selecting sentences) or abstractive (generating new text).
Question Answering: Extracting answers from documents or generating them from knowledge.
Machine Translation: Converting text between languages.
Text Generation: Creating new text from prompts. Creative writing, code generation, dialogue.

4. Dialogue and Interactive Tasks

Intent Classification: Determining what the user wants.
Slot Filling / Entity Extraction: Extracting parameters for the intent ("Book a flight from Mumbai to Delhi on March 15" → origin, destination, date).
Dialogue State Tracking: Maintaining context across conversation turns.
Response Generation: Generating appropriate replies.

5. Knowledge-Level Tasks

Relation Extraction: Finding relationships between entities. "Steve Jobs founded Apple" → (Steve Jobs, founded, Apple).
Knowledge Graph Construction: Building structured knowledge from text.
Entailment / Natural Language Inference:: Determining if one sentence logically follows from another. "All cats are mammals. Garfield is a cat." → "Garfield is a mammal."

Case Study: How These Tasks Combine in a Real Product

Product: A smart email assistant that helps you write replies.

1. Tokenization → Split the received email into words

2. POS Tagging → Identify nouns, verbs, questions

3. NER → Extract names, dates, meeting locations mentioned

4. Intent Classification → Is this a request, a question, a meeting invite, or just FYI?

5. Sentiment Analysis → Is the sender angry, happy, or neutral?

6. Dependency Parsing → Understand who is asking what of whom

7. Text Generation → Generate a draft reply that matches the tone and intent

8. Grammar Checking → Ensure the generated reply is grammatically correct

This is a complete NLP pipeline using 8+ different task types. Each task is a module you will learn in this course.

Task	Input	Output	Example Models	Difficulty
Tokenization	Raw text	List of tokens	NLTK, SpaCy, BPE, WordPiece	Easy
POS Tagging	Tokens	POS tags	HMM, CRF, BERT	Medium
NER	Text	Entity spans + labels	SpaCy, BERT-CRF, Flair	Medium
Dependency Parsing	Text	Parse tree	SpaCy, Stanford Parser, UDPipe	Hard
Sentiment Analysis	Text	Label (+/-)	Naive Bayes, LSTM, BERT	Easy-Medium
Machine Translation	Text in language A	Text in language B	Google Translate, NLLB, GPT-4	Very Hard
Question Answering	Question + context	Answer span	BERT-QA, T5, GPT-4	Hard
Text Summarization	Long document	Short summary	Pegasus, BART, GPT-4	Hard
Dialogue Systems	Conversation history	Response	DialoGPT, LaMDA, GPT-4	Very Hard

Common Mistakes

Thinking every problem is text classification: Beginners try to solve every NLP problem with a text classifier. NER requires sequence labeling, translation requires seq2seq models, and dialogue requires state tracking. Choose the right task type for the problem.
Ignoring evaluation metrics: Different tasks need different metrics. Accuracy for classification, F1 for NER, BLEU for translation, ROUGE for summarization. Using the wrong metric leads to misleading results.
Not understanding task dependencies: You can't do NER without tokenization. You can't do dependency parsing without POS tagging. The pipeline matters. Don't skip foundational steps.

Interview Checkpoints

Q: Name the 5 levels of NLP analysis and give one example task from each.
A: (1) Lexical: tokenization, WSD; (2) Syntactic: POS tagging, dependency parsing; (3) Semantic: NER, word embeddings; (4) Discourse: coreference resolution, discourse parsing; (5) Pragmatic: intent classification, dialogue management.
Q: What is the difference between NER and relation extraction?
A: NER identifies entities ("Barack Obama," "White House"). Relation extraction identifies relationships between them ("Barack Obama was president of the White House"). NER is about WHAT; relation extraction is about HOW THEY CONNECT.
Q: What evaluation metric would you use for machine translation? For text summarization?
A: Translation: BLEU (n-gram overlap), chrF (character-level), or COMET (neural metric). Summarization: ROUGE (n-gram overlap with reference), BERTScore (semantic similarity), or human evaluation. Never use accuracy for either — there are many valid translations/summaries.

Hands-On Practice

Basic: For each of these real-world scenarios, identify which NLP task(s) are needed: (a) Gmail spam filter, (b) Google Translate, (c) Siri voice assistant, (d) Grammarly, (e) Google Search.
Intermediate: Pick a product you use daily (e.g., WhatsApp, Instagram, Amazon). Map out the NLP tasks it likely performs. For each task, name the input and output.
Advanced: Research the GLUE and SuperGLUE benchmarks. What tasks do they cover? Why were they created? What does it mean that LLMs now score above human baselines on some of these?

Day 7 Recap

NLP tasks organized by level: word, sequence, document, dialogue, knowledge
Every real product combines multiple tasks in a pipeline
Choosing the right task type and evaluation metric is crucial
Tasks have dependencies: tokenization → POS → parsing → NER → relation extraction

Next: Day 8 — NLP vs Computer Vision vs Reinforcement Learning

Day 8

NLP vs Computer Vision vs Reinforcement Learning

Why this matters

Modern AI products combine NLP, CV, and RL. Understanding the differences and synergies helps you design better systems and choose the right approach for each problem. This is a common interview topic.

The Three Pillars of Modern AI

Machine Learning has three major subfields that dominate industry applications:

Field	Input	Output	Core Challenge	Key Architecture
NLP	Text / Language	Text, labels, sequences	Ambiguity, context, world knowledge	Transformer, RNN, LSTM
Computer Vision (CV)	Images / Video	Labels, bounding boxes, masks	Scale, rotation, occlusion, lighting	CNN, Vision Transformer, YOLO
Reinforcement Learning (RL)	States from environment	Actions	Exploration vs exploitation, credit assignment	Q-Learning, Policy Gradient, PPO

Deep Dive: NLP

What makes NLP unique:

Discrete input space: Text is discrete (words are categorical), not continuous like pixels. This requires embedding layers to convert words to vectors.
Variable-length input: A sentence can be 3 words or 300 words. Models must handle variable-length sequences.
Long-range dependencies: "The cat, which was very hungry and had been searching for hours, finally ate the fish." The subject "cat" and verb "ate" are far apart. Transformers handle this with self-attention.
Compositionality: Meaning is built from smaller parts (words → phrases → sentences → paragraphs). Understanding requires hierarchical processing.

Data characteristics: Text data is everywhere — books, websites, social media. But it's noisy, unstructured, and often unlabeled. High-quality labeled data is expensive to create.

Deep Dive: Computer Vision

What makes CV unique:

Continuous input space: Pixels are continuous values (0-255). This makes convolution operations natural and effective.
Spatial structure: Nearby pixels are related. CNNs exploit this with local receptive fields and weight sharing.
Translation invariance: A cat in the top-left corner is still a cat. CNNs are naturally translation-invariant.
High dimensionality: A 1080p image has 2,073,600 pixels. This requires efficient architectures.

Data characteristics: Images are structured and dense, but labeling is expensive (need bounding boxes, segmentation masks). Data augmentation (rotation, flipping, cropping) is very effective for CV.

Deep Dive: Reinforcement Learning

What makes RL unique:

No labeled dataset: The agent learns by interacting with an environment and receiving rewards or penalties.
Delayed rewards: A chess move might not show its value until 20 moves later. Credit assignment is hard.
Exploration vs exploitation: Should the agent try a new action (explore) or stick with what works (exploit)? This is the core tradeoff.
Environment dynamics: The agent's actions change the environment, which affects future decisions.

Data characteristics: RL generates its own data through interaction. No need for pre-labeled datasets, but training is slow and unstable. Simulators are often used (e.g., game environments, robot simulations).

Where They Overlap: Multi-Modal AI

Modern AI systems combine all three:

Vision-Language Models (VLM): GPT-4V, Claude, Gemini can look at images and describe them in text. They combine CV encoders (process image) with NLP decoders (generate text).
Self-Driving Cars: CV (recognize objects) + RL (make driving decisions) + NLP (understand voice commands, read signs).
Robotics: CV (see the world) + RL (learn to grasp/manipulate) + NLP (follow instructions like "pick up the red block").
AI Agents: LLM (reasoning and planning) + tools (search, code execution) + RL (learn from feedback). This is the hottest area in AI right now.

☕ Chai Wala Analogy: Think of AI as running a chai shop. NLP is understanding what customers say ("Bhaiya, ek kadak chai"). CV is seeing whether the chai is the right color and consistency. RL is learning over time which ingredients and methods make customers come back. A great chai wala uses all three: listens to customers, watches the chai, and improves with experience.

How to Choose the Right Approach

Problem	Best Approach	Why
Customer support chatbot	NLP	Language understanding and generation
Defect detection in manufacturing	CV	Visual pattern recognition
Game-playing AI	RL	Sequential decision making with delayed rewards
Medical diagnosis from X-rays	CV + NLP	Image analysis + report generation
Autonomous driving	CV + RL + NLP	See + Decide + Understand instructions
Content moderation	NLP + CV	Text + image analysis for harmful content

Common Mistakes

Thinking you need to master all three: Most practitioners specialize. Choose one as your primary expertise and learn the others at a high level. A "full-stack AI engineer" knows all three but is an expert in one.
Using CV models for text: Applying CNNs to text (character-level CNNs) works for some tasks but misses long-range dependencies. Transformers are almost always better for text.
Using RL when supervised learning would suffice: RL is powerful but training is unstable and requires simulators. If you have labeled data, use supervised learning. Only use RL when the problem is truly about sequential decision-making.
Ignoring multi-modal trends: The future is multi-modal. Even if you specialize in NLP, understanding how vision models work will help you work on VLMs and agents.

Interview Checkpoints

Q: What are the key differences between NLP and Computer Vision?
A: (1) Input: text is discrete and sequential; images are continuous and spatial. (2) Data augmentation: very effective in CV (rotate, flip, crop) but harder in NLP (synonym replacement, back-translation). (3) Architecture: Transformers dominate NLP; CNNs dominate CV, though Vision Transformers are gaining ground. (4) Interpretability: attention weights in NLP are somewhat interpretable; feature maps in CNNs are harder to interpret.
Q: When would you use RL instead of supervised learning?
A: Use RL when: (1) You don't have a labeled dataset, (2) The problem involves sequential decision-making, (3) Rewards are delayed, (4) The environment is dynamic and the agent must learn from interaction. Examples: game playing, robotics, recommendation systems, trading strategies. Use supervised learning when you have labeled data and the problem is prediction or classification.
Q: What is a multi-modal model and why are they important?
A: A multi-modal model processes multiple types of input (text, image, audio, video). GPT-4V, Gemini, and CLIP are examples. They are important because the real world is multi-modal — we see, hear, and speak simultaneously. Applications include image captioning, visual question answering, and AI agents that can perceive and act in the world.

Hands-On Practice

Basic: For each of these products, identify which AI fields are used: (a) Tesla Autopilot, (b) Amazon Alexa, (c) Instagram content recommendation, (d) Google Lens, (e) Chess.com AI opponent.
Intermediate: Research CLIP (Contrastive Language-Image Pre-training). How does it combine NLP and CV? What are its applications? Try the OpenAI CLIP demo if available.
Advanced: Read about AlphaGo (RL) and AlphaFold (structure prediction). How do they differ from supervised learning? What makes RL training so challenging compared to standard ML?

Day 8 Recap

NLP, CV, and RL are the three pillars of modern AI
NLP: discrete, sequential, context-dependent; Transformers dominate
CV: continuous, spatial, structured; CNNs dominate, ViT rising
RL: no labels, learns from environment, delayed rewards; unstable but powerful
The future is multi-modal: combining all three (VLM, agents, robotics)

Next: Day 9 — Introduction to Language Models