Published on January 15, 2025
Setting up Postgres and pgvector with Docker for building RAG applications
If you’re building anything that involves AI and Large Language Models (LLMs), you have likely heard of Retrieval-Augmented Generation (RAG). RAG is a technique used when building AI applications for storing and retrieving information from external sources.
A very common use case for RAG are chat with your documents
applications. These applications allow the users to upload documents and then ask questions about the documents.
For example, if you upload a very long document about a product, the application will chunk the document into smaller pieces, create vector embeddings for each piece, and store the embeddings in a database. When the user asks a question, the application will use the embeddings to find the most relevant pieces of the document, and then use the LLM to answer the question.
In this blog post, I’ll show you how you can set up Postgres with the pgvector extension in Docker for building RAG applications.
Step 1: Writing the Dockerfile
To start, we need to create a Dockerfile that will build our custom Docker image with PostgreSQL and the pg-vector extension. Open a text editor and create a new file named Dockerfile
in your project directory. Add the following content:
# Use the official PostgreSQL 16.4 image as the base
FROM postgres:16.4
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
git \
postgresql-server-dev-16
# Clone and build pgvector
RUN git clone https://github.com/pgvector/pgvector.git \
&& cd pgvector \
&& make \
&& make install
# Clean up
RUN apt-get remove -y build-essential git postgresql-server-dev-16 \
&& apt-get autoremove -y \
&& rm -rf /var/lib/apt/lists/*
This Dockerfile starts with the official PostgreSQL 16.4 image, which provides a stable and reliable base for our database. It then installs the necessary build tools and development files required to compile and install the pg-vector extension. The pgvector repository is cloned from GitHub, and the extension is built and installed using the make
command. Finally, the Dockerfile removes the unnecessary build tools to keep the final image size smaller, which is important for efficient deployment.
Step 2: Building the Docker Image
With the Dockerfile ready, we can now build our custom Docker image. Open your terminal, navigate to the directory containing the Dockerfile, and run:
docker build -t postgres-pgvector .
This command tells Docker to read the Dockerfile in the current directory (represented by .
) and build an image based on its instructions. The -t
flag tags the resulting image with the name “postgres-pgvector”, making it easier to reference later. The build process may take a few minutes as it downloads the base image, installs dependencies, and compiles the pg-vector extension.
Step 3: Running the Docker Container
After building the image, we need to run a container from it. Use this command in your terminal:
docker run \
--name postgres-vector \
-e POSTGRES_PASSWORD=mysecretpassword \
-d \
-p 5432:5432 \
postgres-pgvector
```
This command starts a new container named "postgres-vector" from our custom image. The `-e` flag sets the PostgreSQL superuser password to "mysecretpassword", which you should replace with a secure password of your choice. The `-d` flag runs the container in detached mode, meaning it will run in the background. The `-p 5432:5432` option maps the container's port 5432 to the host's port 5432, allowing you to connect to the database from outside the container.
## Step 4: Connecting to the Database
Now that our container is running, we need to connect to the PostgreSQL database inside it. Run this command:
```bash
docker exec -it postgres-vector psql -U postgres
This command opens an interactive psql session in the running container. The -it
flags allow for interactive input and output, and -U postgres
specifies that you want to connect as the postgres user. You’ll be prompted to enter the password you set when running the container.
Step 5: Creating a New Database
Once connected, you might want to create a new database for your RAG application. Here’s how to do it:
CREATE DATABASE vectordb;
\c vectordb
The first command creates a new database named “vectordb”. This database will be used to store your RAG data, including vector embeddings. The second command (\c
) switches your current connection to the newly created database. This step is important because you’ll want to manage your RAG data in a dedicated database for better organization and security.
Step 6: Enabling the pg-vector Extension
To use vector operations, we need to enable the pg-vector extension in our database:
CREATE EXTENSION vector;
This command installs the vector extension, which adds support for vector data types and similarity search functions to PostgreSQL. This is crucial for RAG applications, as it allows you to store and query vector embeddings efficiently.
Step 7: Using pg-vector for RAG
Now that we have pg-vector set up, we can start using it for our RAG application. Here’s a simple example of how to create a table with a vector column and perform a similarity search:
CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
SELECT * FROM items ORDER BY embedding <-> '[3,1,2]' LIMIT 5;
The first command creates a table named “items” with an “id” column and an “embedding” column of type vector with 3 dimensions. The second command inserts two sample vectors into the table. The third command performs a similarity search, ordering the results by their distance from the vector [3,1,2]
and limiting the output to 5 results. This demonstrates how you can use pg-vector to store and query vector embeddings, which is essential for RAG applications.
Step 8: Embedding a Document Using OpenAI’s API
Now that we have our PostgreSQL database set up with the pg-vector extension, let’s see how we can embed a document using OpenAI’s API. This step is almost always required for RAG applications, as it allows us to convert text into vector embeddings that can be stored and searched efficiently.
To get embeddings, we’ll send our text string to the embeddings API endpoint along with the embedding model name. In this example, we’ll use the text-embedding-3-small
model. Here’s how you can do it:
import OpenAI from 'openai';
const openai = new OpenAI();
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input:
'Blue whales are the largest animals ever known to have lived on Earth. They can reach lengths of up to 100 feet and weigh as much as 200 tons. They are known for their immense size and distinctive blue-gray coloration.',
encoding_format: 'float'
});
console.log(embedding);
The response from the API will contain the embedding vector along with some additional metadata. Here’s an example of what the response might look like:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.006929283495992422, -0.005336422007530928, -4.547132266452536e-5,
-0.024047505110502243
]
}
],
"model": "text-embedding-3-small",
"usage": {
"prompt_tokens": 5,
"total_tokens": 5
}
}
By default, the length of the embedding vector will be 1536 for text-embedding-3-small
or 3072 for text-embedding-3-large
. You can reduce the dimensions of the embedding by passing in the dimensions
parameter without the embedding losing its concept-representing properties. You can learn more about this in the OpenAI documentation.
Once you have the embedding, you can insert it into your PostgreSQL database using the pg-vector extension. For example, if you created a table like this:
CREATE TABLE documents (id bigserial PRIMARY KEY, content text, embedding vector(1536));
You can insert the embedding like this:
INSERT INTO documents (content, embedding) VALUES ('Blue whales are...', '[-0.006929283495992422, -0.005336422007530928, -4.547132266452536e-05, -0.024047505110502243, ...]');
This process allows you to store and search vector embeddings in your database, which is essential for building effective RAG applications.
Step 9: Retrieving the Most Relevant Documents
Once you have stored your document embeddings in the PostgreSQL database using pg-vector, the next step is to retrieve the most relevant documents based on a user’s query. This is a key part of building RAG applications, as it allows you to find and use the most relevant information to generate responses.
To retrieve the most relevant documents, you’ll first need to generate an embedding for the user’s query using the same OpenAI API we used earlier. Here’s how you can do that:
import OpenAI from 'openai';
const openai = new OpenAI();
const queryEmbedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'What is the size of a blue whale?',
encoding_format: 'float'
});
console.log(queryEmbedding);
Once you have the query embedding, you can use it to search for the most similar documents in your database. Here’s an example of how to do this using pg-vector:
SELECT * FROM documents
ORDER BY embedding <-> '[-0.006929283495992422, -0.005336422007530928, -4.547132266452536e-05, -0.024047505110502243, ...]'
LIMIT 5;
In this SQL query, embedding
is the column in your documents
table that stores the vector embeddings, and the array of numbers represents the query embedding. The <->
operator calculates the Euclidean distance between the query embedding and each document embedding, and the results are ordered by this distance. The LIMIT 5
clause returns the top 5 most relevant documents.
Going Further
What we covered here was a very basic example of using Postgres and pg-vector for RAG applications. If you’re interested in learning more about this, you can check out our other blog posts on creating AI agents in Node with AI SDK that gives you more building blocks for your RAG application.
You can even go further and add web scraping and large language models to the mix to go even further with your setup.
Conclusion
As you can see, setting up Postgres with the pg-vector extension in Docker is quite simple. Once you have the database set up, you may use any embedding model to embed documents, user submissions, or any other data you want to store in the database. You can then use the embedding model to generate an embedding for a question and then use the similarity search to find the most relevant documents.
Finally, once you have the most relevant documents, you can pass the data as well as the question to a large language model to generate a response. And you got yourself a RAG application that can answer questions about your input data.