Blueprint · beginner · 6 steps

Deploy Gemma as an Open API on Cloud Run

Serve Google's open Gemma 4 on a serverless Cloud Run GPU — one Dockerfile, one deploy command, unauthenticated behind a single HTTPS URL. Then call it straight from a static webpage with streaming, no backend of your own.

← All blueprintsSource code on GitHub →
Your progress0 / 6 steps· 0%

All steps

01Step 1: What We're BuildingA public HTTPS endpoint that serves Google's open Gemma 4 model from a serverless GPU on Cloud Run — no login, no API key — plus a single static webpage that streams answers from it straight in the browser.4 min02Step 2: Set Up Google CloudInstall the `gcloud` CLI, log in, create a fresh project, link billing, enable the three APIs the deploy needs, and pick a region that actually has L4 GPUs.3 min03Step 3: Bake Gemma into a ContainerWrite a six-line `Dockerfile` that starts from the Ollama image, bakes the Gemma 4 weights into an image layer, binds to Cloud Run's port, and turns on CORS so a browser can call it.4 min04Step 4: Deploy as an Open APIOne `gcloud run deploy --source .` with a GPU attached and `--allow-unauthenticated` builds the image, provisions an L4, and hands you a public URL — then three curls prove it's open, CORS-ready, and answering.5 min05Step 5: Call It from a WebpageA single `index.html` — no framework, no build step — POSTs a message to `/v1/chat/completions` and streams Gemma's reply token by token. Open it locally or host it anywhere.4 min06Step 6: Costs, Safety & TeardownWhat an open GPU endpoint actually costs, how to keep a stranger from running up your bill, how to lock it down for production, and the two commands that take it all back to zero.5 min