Module 6: Embeddings & Vector Stores
Master Embeddings and Vector Stores in LangChain — OpenAI Embeddings, HuggingFace, FAISS, Chroma, Pinecone, cosine similarity, MMR search, and the retriever interface.
Embeddings are the mathematical bridge between text and meaning. They convert words and sentences into dense numerical vectors in a high-dimensional space where semantically similar texts are geometrically close. This module covers embedding models (OpenAI, HuggingFace), all major vector stores (FAISS, Chroma, Pinecone), similarity metrics, and the LangChain retriever interface — the complete foundation for any RAG system.
Embeddings & Vector Similarity
Why this matters
Embeddings map text to vectors; similarity search finds relevant chunks for grounded answers.
Embeddings encode semantic meaning; similar texts have high cosine similarity in vector space.
- OpenAI
text-embedding-3-small— strong default. - Hugging Face models for local/offline embedding.
- Normalize vectors when comparing across batches.
Common mistakes
- Hard-coding API keys in source instead of environment variables.
- Passing raw strings where ChatPromptTemplate expects message tuples.
- Skipping text splitting before embedding large PDFs (context overflow).
Interview checkpoints
- Q: Explain embeddings in LangChain. A: One-sentence definition + one API name.
- Q: Common bug? A: Keys, message format, or missing split/embed step.
Practice
- Basic: Sketch a minimal embeddings snippet.
- Intermediate: Run a notebook cell demonstrating Embeddings.
- Advanced: Break Embeddings intentionally and interpret the error.
Recap
- You can explain embeddings clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: Vector Stores
FAISS, Chroma & Retrievers
Why this matters
Vector stores (FAISS, Chroma) persist embeddings; retrievers expose a clean query interface for RAG.
FAISS (in-memory) and Chroma (persistent) store vectors; .as_retriever() exposes top-k search.
from langchain_community.vectorstores import FAISS
store = FAISS.from_documents(chunks, embeddings)
retriever = store.as_retriever(search_kwargs={"k": 4})
docs = retriever.invoke("What is our refund policy?")Common mistakes
- Hard-coding API keys in source instead of environment variables.
- Passing raw strings where ChatPromptTemplate expects message tuples.
- Skipping text splitting before embedding large PDFs (context overflow).
Interview checkpoints
- Q: Explain vector stores in LangChain. A: One-sentence definition + one API name.
- Q: Common bug? A: Keys, message format, or missing split/embed step.
Practice
- Basic: Sketch a minimal vector stores snippet.
- Intermediate: Run a notebook cell demonstrating Vector Stores.
- Advanced: Break Vector Stores intentionally and interpret the error.
Recap
- You can explain vector stores clearly.
- You know one mistake to avoid.
- You see how this connects to the next lesson.
Next: RAG Pipeline
