In a previous post, we explored how Postgres - when paired with the pgvector extension - can serve as a powerful foundation for AI-enabled applications. This follow-up goes further. It's not just about what’s possible - it’s about how to build it.
In this article, we’ll walk through practical steps to turn Postgres into your vector database of choice, combining semantic similarity search with traditional keyword matching, and even enabling Retrieval-Augmented Generation (RAG) workflows - all without adding a new system to your stack.
Why Stay in Postgres?
If you are like a lot of teams, you are already running Postgres. Instead of spinning up a specialised vector database and duplicating your data pipeline, pgvector allows you to embed vector capabilities directly within your existing database. That means:
Fewer moving parts
Easier integration with existing apps
A single place to query, index, and scale
If you already trust Postgres with your data, pgvector lets you extend that trust into the AI domain.
Step 1: Setting Up pgvector
First, make sure pgvector is installed. If you're using Docker, you can use the ankane/pgvector image. Then, inside your database:
CREATE EXTENSION IF NOT EXISTS vector;
Add a table to store documents and their embeddings:
CREATE TABLE documents (
id bigserial PRIMARY KEY,
content text NOT NULL,
fts tsvector GENERATED ALWAYS AS (to_tsvector('english', content)) STORED,
embedding vector(384) -- or 768/1536, depending on your model
);Add indexes to support both semantic and keyword search:
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON documents USING gin (fts);
This setup ensures you can efficiently search using both vector distance and traditional full-text matching. In our course, we dive into these different indexing techniques and think about how to optimise them.
Step 2: Generating Embeddings
Here’s a simple Python script using OpenAI to create embeddings and store them in Postgres:
import psycopg
from pgvector.psycopg import register_vector
import openai
openai.api_key = "your-api-key"
conn = psycopg.connect("dbname=yourdb user=youruser password=yourpass")
register_vector(conn)
text = "This is a customer support article about billing issues."
embedding = openai.Embedding.create(
input=[text],
model="text-embedding-ada-002"
)['data'][0]['embedding']
with conn.cursor() as cur:
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(text, embedding)
)
conn.commit()You can adapt this to bulk-load your dataset or trigger embedding generation on save. Equally, we can use local models to handle the embeddings which can enhance security while reducing costs.
Step 3: Semantic Search in SQL
Once you’ve embedded your content, querying is just a matter of computing similarity:
SELECT id, content
FROM documents
ORDER BY embedding <#> '[YOUR_QUERY_VECTOR]' -- cosine distance
LIMIT 5;
This will return the most semantically similar documents. Need higher performance? Use the hnsw index to scale up. While we are using cosine distance here, there are other types of vector search that we can employ. Each comes with their own benefits and considerations.
Step 4: Combine Full-Text and Vector Search (Hybrid Search)
In real-world applications - like product search, knowledge base queries, or recommendation engines - semantic and keyword search are often stronger together.
We can rank results from both approaches using a technique like Reciprocal Rank Fusion (RRF). Here’s a simplified version:
-- Assume you’ve already got your query vector and tsquery
WITH vector_results AS (
SELECT id, row_number() OVER () AS rank
FROM documents
ORDER BY embedding <#> '[...]'
LIMIT 10
),
text_results AS (
SELECT id, row_number() OVER () AS rank
FROM documents
WHERE fts @@ to_tsquery('billing & issue')
ORDER BY ts_rank(fts, to_tsquery('billing & issue')) DESC
LIMIT 10
),
fused AS (
SELECT id, 1.0 / (COALESCE(v.rank, 50) + COALESCE(t.rank, 50)) AS score
FROM vector_results v
FULL OUTER JOIN text_results t USING (id)
)
SELECT documents.*
FROM fused
JOIN documents USING (id)
ORDER BY score DESC
LIMIT 5;
The result? A hybrid ranking that balances semantic meaning and keyword precision. In combining the semantic and the structured data, we can help our users achieve their goals faster.
Step 5: Powering RAG Workflows
If you’re building with LLMs, you’re likely considering Retrieval-Augmented Generation (RAG). With RAG, you:
Convert a user query into an embedding.
Find relevant documents via vector search.
Feed those documents (alongside the query) to a language model.
This pattern makes your AI app grounded, explainable, and far more reliable than prompting alone. With pgvector, all of this can happen in a single SQL query. Combine it with a Python or Node.js wrapper, and you're in production.
Step 6: Why take the course?
With expert guidance, you'll gain practical hands-on experience writing usable code, working with sample data, and exploring use cases for search, recommendations, and support chatbots.
You'll get the opportunity to discuss your own challenges, create elegant solutions to complex problems, and build confidence designing LLM-driven solutions.
In a nutshell, you'll learn:
How to integrate embedding generation into your pipeline
How to optimise indexes and queries for large-scale performance
How to design hybrid search and RAG pipelines that actually work
Whether you're a backend developer, data engineer, or tech lead exploring AI tooling, this course will get you moving fast - with confidence.