Install the gcloud CLI
gcloud is how you create the project and deploy the service from your terminal. One download.
# macOS (with Homebrew)
brew install --cask gcloud-cli
# Linux (one-liner installer)
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
# Windows: download the installer
# https://cloud.google.com/sdk/docs/installVerify it landed:
gcloud --versionYou should see Google Cloud SDK ... and a version number.
Log In
Just one login this time:
gcloud auth loginThis pops a browser; sign in with your Google account. That's all the auth you need.
Why no
application-default login? Most GCP tutorials also have you set up Application Default Credentials so your code can call Google APIs. We don't need that here — the running service calls no Google APIs. Ollama serves Gemma from the GPU locally; the only thing that talks to Google isgclouddoing the deploy, whichgcloud auth loginalready covers.
Create a Project
A project is GCP's billing and isolation boundary. Everything we create lives in one project. Delete the project, everything goes with it.
# Project IDs must be globally unique: lowercase, hyphens, numbers, 6–30 chars.
export PROJECT_ID="gemma-run-$(date +%s)"
gcloud projects create $PROJECT_ID --name="Gemma on Cloud Run"
gcloud config set project $PROJECT_IDThe $(date +%s) suffix is just a cheap way to dodge ID collisions — plain gemma-run is almost certainly taken.
Link Billing
A GPU needs a billing account attached, even inside the free-credit window.
gcloud billing accounts listYou should see at least one account with OPEN: True. Grab its ACCOUNT_ID (it looks like 01ABCD-EFGH12-IJKL34) and link it:
gcloud billing projects link $PROJECT_ID --billing-account=YOUR_BILLING_ACCOUNT_IDNo billing account yet? If gcloud billing accounts list comes back empty, create one in the console at console.cloud.google.com/billing — add a card and accept the free-trial credit — then re-run the list command to get your ACCOUNT_ID and link it. New Google Cloud accounts currently get $300 in free credit over 90 days (check cloud.google.com/free for the latest offer) — more than enough for this blueprint.
Enable the APIs
GCP enables nothing by default. You need exactly three services:
gcloud services enable \
run.googleapis.com \
artifactregistry.googleapis.com \
cloudbuild.googleapis.com| API | Used for |
|---|---|
run.googleapis.com | Cloud Run — the service that runs the container with a GPU |
cloudbuild.googleapis.com | Cloud Build — turns our Dockerfile into an image when we deploy with --source . |
artifactregistry.googleapis.com | Where the built container image is stored |
This takes ~30 seconds. No Vertex AI, no Secret Manager — there's nothing to authenticate against.
Pick a Region with L4 GPUs
GPUs are only available in some regions. As of mid-2026, Cloud Run L4 GPUs are in us-central1 (Iowa), us-east4, europe-west1, europe-west4, and asia-southeast1 (with asia-south1 invitation-only). Pick the closest:
# us-central1 is a good default — it also has the free CPU/memory tier.
export REGION="us-central1"Every command for the rest of the blueprint references $REGION.
Do I need to request GPU quota? No. L4 GPUs on Cloud Run are generally available and self-serve. The first time you deploy a GPU service in a region, Google auto-grants you a small quota (3 GPUs, with zonal redundancy off — the mode we use). No quota form, no waiting.
Verify Your Setup
gcloud config listYou should see your account and project:
[core]
account = you@gmail.com
project = gemma-run-1750000000What You Have Now
gcloudinstalled and logged in- A fresh project with billing linked
- Three APIs enabled (
run,cloudbuild,artifactregistry) - A GPU-capable region picked
Next: build the container that holds Gemma.
Reference: Install the gcloud CLI · Cloud Run GPU regions · Project lifecycle · Google Cloud free tier