Blueprint · beginner · 9 steps

Build a Context-Aware RAG App on Google Cloud

Document Q&A end-to-end on GCP — Cloud SQL Postgres with pgvector, Vertex AI for embeddings and Gemini, FastAPI on Cloud Run. Task-typed embeddings, metadata-filtered retrieval, and optional contextual chunking. IAM auth all the way down, no passwords.

← All blueprintsSource code on GitHub →
Your progress0 / 9 steps· 0%

All steps

01Step 1: What We're BuildingA document Q&A service running entirely on Google Cloud — Cloud SQL Postgres with pgvector stores the embeddings, Vertex AI does the embedding and the answering, and Cloud Run serves the API.6 min02Step 2: Set Up Your Google Cloud ProjectInstall the `gcloud` CLI, create a fresh project, link billing, enable the four APIs we need, and log in so your laptop can talk to Google Cloud.4 min03Step 3: Create the Cloud SQL InstanceSpin up a tiny Postgres instance on Cloud SQL, create a database and an IAM user for yourself, and turn on the `vector` extension.4 min04Step 4: Project Scaffold and First ConnectionCreate a Python project with `uv`, install the four packages we need, write a tiny `db.py` that uses Google's Cloud SQL Python Connector, and prove it can `SELECT 1` against the database you just made.5 min05Step 5: Ingest — Chunk, Embed, InsertRead every `.txt` file in `data/`, split each one into ~400-token chunks, send each chunk to Vertex AI's `text-embedding-005`, and `INSERT` the chunk + its 768-dim vector into the `chunks` table.5 min06Step 6: Query — Similarity Search + Gemini AnswerEmbed the user's question, find the top-K closest chunks with cosine distance, hand them to Gemini with a "use only this context" prompt, return the answer.5 min07Step 7: Make It Context-AwareAdd structured metadata to every chunk, pre-filter the candidate set with a Postgres `WHERE` clause before the vector scan, and (optionally) prepend per-chunk document context so the embeddings themselves know more.9 min08Step 8: Wrap in FastAPI, Deploy to Cloud RunExpose `ask()` as a `POST /ask` endpoint, write a four-line `Dockerfile`, create a service account with the right roles, and ship it with one `gcloud run deploy --source .`.5 min09Step 9: What's NextYou have a production-shaped RAG app on GCP. Here's where to take it — auth, evals, private networking, scaling pgvector, swapping models, and turning the whole thing into Terraform.6 min