Retrieval-Augmented Generation (RAG)
RAG combines a language model with document retrieval. Instead of relying only on what the model learned during training, a RAG workflow fetches relevant information from your documents and passes it to the model alongside the question.
Typical use cases include:
- Answering questions about internal documentation
- Building FAQ systems from product or support content
- Querying large document collections in natural language
- Grounding model responses in up-to-date or domain-specific data
How RAG works
A RAG pipeline has three steps: store documents as vectors, retrieve the most relevant passages for a question, and inject them into the prompt.
- What is RAG?: How retrieval-augmented generation works and when to use it.
Build a RAG pipeline
- Create a vector store: Split documents into chunks and store them as embeddings.
- Retrieve context: Find the most relevant passages for a user question.
- Inject context into a prompt: Add retrieved passages to the model prompt.
Tutorial
Follow a complete example that builds a product FAQ assistant using a RAG pipeline: