Create embeddings
Generate vector representations from text using the Text Embedder node.
When this is useful
Embeddings are numerical vectors that capture the semantic meaning of text and can be used for similarity search, clustering, and retrieval-based workflows.
Create embeddings when you want to:
- Compare texts by meaning rather than keywords
- Find similar documents or records
- Group or cluster text data
- Prepare data for retrieval-augmented generation (RAG)
If you only need text generation or classification, check out Prompt a model instead.
Build an embedding workflow
Prerequisites
Before you start, make sure that you have:
- installed the KNIME AI Extension (see Install the KNIME AI Extension).
- configured credentials for a supported provider (see LLM Providers).
1. Select an embedding model
The selection of an embedding model depends on where the model runs.
Hosted models (authentication required)
If you use a hosted provider (for example, OpenAI):
- Store your API key using a Credentials Configuration or Credentials Widget.
- Connect the credentials to the corresponding authenticator, such as OpenAI Authenticator
- Select an embedding-capable model using an Embedding Model Selector, for example:
OpenAI Embedding Model Selector
If the authenticator shows a green status light, the connection is successful.
Local models (no authentication required)
If you use a local model, configure the GPT4All Embedding Model Selector to point to your local model file.
Use the same embedding model consistently
Always use the same embedding model when creating and querying embeddings. Mixing models will lead to incorrect similarity results.
2. Provide text input
Provide the text you want to embed as a table column.
Each row in the table will be processed independently.
3. Generate embeddings
Use the Text Embedder node.
Configure the node as follows:
- Connect the model input port to the selected embedding model
- Connect the data input port to the table containing the text column
- Select the column that contains the text to embed
The node appends a new column containing one embedding vector per input row.
Result
Each embedding is a high-dimensional numeric representation of the input text, designed for similarity-based operations rather than direct inspection.
To see how embeddings are used for similarity analysis and visualization, check out this example workflow on the KNIME Hub:
➡️ Compare texts by semantic similarity
Next steps
- Build a RAG pipeline using embeddings for context retrieval