What is Retrieval-Augmented Generation (RAG)?

Retrieval-Augmented Generation (RAG) is a pipeline that combines a language model with external knowledge at query time.

Instead of relying only on the model’s internal training data, RAG retrieves relevant information from your own documents and injects it into the prompt, allowing the model to generate responses that are grounded in up-to-date or private data.

Why RAG is needed

Large language models are powerful, but they have important limitations:

they cannot access private documents by default
they do not know about newly created or frequently changing data
they may hallucinate when asked about unknown topics

RAG addresses these limitations by separating knowledge storage from generation.

The model remains general-purpose, while your data is retrieved dynamically and provided as context only when needed.

The core idea behind RAG

At a high level, RAG works by:

Converting documents into embeddings
Storing those embeddings in a vector store
Retrieving the most relevant content for a given query
Injecting that content into the prompt sent to the model

The model then generates an answer using both:

the user’s question
the retrieved context

This keeps responses grounded in your data without retraining the model.

The main building blocks

Vector stores

A vector store holds embeddings that represent your documents in numerical form.

Similarity search is used to find the most relevant documents for a given query.

How to create one:

Create a vector store

Context retrieval

When a user asks a question, the query is embedded and compared against the vector store.

The most relevant entries are retrieved and passed downstream.

How to retrieve context:

Retrieve context

Prompt augmentation

Retrieved content must be merged into a prompt in a controlled way.

This step defines:

how much context is included
how it is framed
how the model should use it

How to inject context:

Inject context into a prompt

Tutorials

Tutorials

Concepts

Connecting

Reading and Transforming

Writing and Modifying

Reference

Tutorials

Data Processing

Environment Management

Alternative Configurations

Concepts

Reference

API Reference

What is Retrieval-Augmented Generation (RAG)?

Why RAG is needed

The core idea behind RAG

The main building blocks

Vector stores

Context retrieval

Prompt augmentation

Tutorials

API Reference

What is Retrieval-Augmented Generation (RAG)? ​

Why RAG is needed ​

The core idea behind RAG ​

The main building blocks ​

Vector stores ​

Context retrieval ​

Prompt augmentation ​

What is Retrieval-Augmented Generation (RAG)?

Why RAG is needed

The core idea behind RAG

The main building blocks

Vector stores

Context retrieval

Prompt augmentation