Step 2: Set Up Google Cloud — Deploy Gemma as an Open API on Cloud Run

Install the gcloud CLI

gcloud is how you create the project and deploy the service from your terminal. One download.

# macOS (with Homebrew)
brew install --cask gcloud-cli
 
# Linux (one-liner installer)
curl https://sdk.cloud.google.com | bash
exec -l $SHELL
 
# Windows: download the installer
# https://cloud.google.com/sdk/docs/install

Verify it landed:

gcloud --version

You should see Google Cloud SDK ... and a version number.

Log In

Just one login this time:

gcloud auth login

This pops a browser; sign in with your Google account. That's all the auth you need.

Why no application-default login? Most GCP tutorials also have you set up Application Default Credentials so your code can call Google APIs. We don't need that here — the running service calls no Google APIs. Ollama serves Gemma from the GPU locally; the only thing that talks to Google is gcloud doing the deploy, which gcloud auth login already covers.

Create a Project

A project is GCP's billing and isolation boundary. Everything we create lives in one project. Delete the project, everything goes with it.

# Project IDs must be globally unique: lowercase, hyphens, numbers, 6–30 chars.
export PROJECT_ID="gemma-run-$(date +%s)"
 
gcloud projects create $PROJECT_ID --name="Gemma on Cloud Run"
gcloud config set project $PROJECT_ID

The $(date +%s) suffix is just a cheap way to dodge ID collisions — plain gemma-run is almost certainly taken.

Link Billing

A GPU needs a billing account attached, even inside the free-credit window.

gcloud billing accounts list

You should see at least one account with OPEN: True. Grab its ACCOUNT_ID (it looks like 01ABCD-EFGH12-IJKL34) and link it:

gcloud billing projects link $PROJECT_ID --billing-account=YOUR_BILLING_ACCOUNT_ID

No billing account yet? If gcloud billing accounts list comes back empty, create one in the console at console.cloud.google.com/billing — add a card and accept the free-trial credit — then re-run the list command to get your ACCOUNT_ID and link it. New Google Cloud accounts currently get $300 in free credit over 90 days (check cloud.google.com/free for the latest offer) — more than enough for this blueprint.

Enable the APIs

GCP enables nothing by default. You need exactly three services:

gcloud services enable \
  run.googleapis.com \
  artifactregistry.googleapis.com \
  cloudbuild.googleapis.com

API	Used for
`run.googleapis.com`	Cloud Run — the service that runs the container with a GPU
`cloudbuild.googleapis.com`	Cloud Build — turns our `Dockerfile` into an image when we deploy with `--source .`
`artifactregistry.googleapis.com`	Where the built container image is stored

This takes ~30 seconds. No Vertex AI, no Secret Manager — there's nothing to authenticate against.

Pick a Region with L4 GPUs

GPUs are only available in some regions. As of mid-2026, Cloud Run L4 GPUs are in us-central1 (Iowa), us-east4, europe-west1, europe-west4, and asia-southeast1 (with asia-south1 invitation-only). Pick the closest:

# us-central1 is a good default — it also has the free CPU/memory tier.
export REGION="us-central1"

Every command for the rest of the blueprint references $REGION.

Do I need to request GPU quota? No. L4 GPUs on Cloud Run are generally available and self-serve. The first time you deploy a GPU service in a region, Google auto-grants you a small quota (3 GPUs, with zonal redundancy off — the mode we use). No quota form, no waiting.

Verify Your Setup

gcloud config list

You should see your account and project:

[core]
account = you@gmail.com
project = gemma-run-1750000000

What You Have Now

gcloud installed and logged in
A fresh project with billing linked
Three APIs enabled (run, cloudbuild, artifactregistry)
A GPU-capable region picked

Next: build the container that holds Gemma.

Reference: Install the gcloud CLI · Cloud Run GPU regions · Project lifecycle · Google Cloud free tier