Retrieval-Augmented Generation (RAG)

RAG combines a language model with document retrieval. Instead of relying only on what the model learned during training, a RAG workflow fetches relevant information from your documents and passes it to the model alongside the question.

Typical use cases include:

Answering questions about internal documentation
Building FAQ systems from product or support content
Querying large document collections in natural language
Grounding model responses in up-to-date or domain-specific data

How RAG works

A RAG pipeline has three steps: store documents as vectors, retrieve the most relevant passages for a question, and inject them into the prompt.

What is RAG?: How retrieval-augmented generation works and when to use it.

Build a RAG pipeline

Create a vector store: Split documents into chunks and store them as embeddings.
Retrieve context: Find the most relevant passages for a user question.
Inject context into a prompt: Add retrieved passages to the model prompt.

Tutorial

Follow a complete example that builds a product FAQ assistant using a RAG pipeline:

Product FAQ assistant

Tutorials

Retrieval-Augmented Generation (RAG) ​

How RAG works ​

Build a RAG pipeline ​

Tutorial ​

Retrieval-Augmented Generation (RAG)

How RAG works

Build a RAG pipeline

Tutorial