IAM and Security for Agent Data Paths

What's the Concept?

When you wire an agent to a tool that queries BigQuery, somewhere a service account is making that query. Which service account? With which permissions? Against which datasets? If those answers aren't deliberate, you've created a security gap — the agent can potentially see, and through its responses leak, data it has no business reading.

The principle: every distinct "consumer role" gets its own service account, scoped to exactly what it needs. The agent's account has read access to gold tables it needs and nothing else — no silver, no bronze, no other agents' gold, no PII columns it doesn't need.

How It Works

The standard service-account topology for an agent stack:

                                  Cloud Run
                                  agent-runtime-sa
                                  (read-only, gold-only)
                                       │
                              ┌────────┴─────────┐
                              ▼                  ▼
                         BigQuery           Vertex AI
                         gold.* only        Embeddings API
                         (specific tables)
 
   pipeline-orchestrator-sa  ──────▶  Composer/Cloud Run + BQ silver/gold rw
 
   ingestion-stripe-sa       ──────▶  GCS bronze write, Secret Manager read
 
   data-engineer-sa          ──────▶  All datasets read; silver/gold write
   (human, via group membership)
 
   pii-reviewer-sa           ──────▶  Granted time-bounded for PII review

The agent's agent-runtime-sa has only:

bigquery.tables.getData on the specific gold tables it queries (not the dataset — the tables).
bigquery.jobs.create on its project.
aiplatform.endpoints.predict on the Vertex AI embedding model.
Nothing else.

If the agent is compromised — prompt injection, a leaked tool call, a misbehaving plugin — the blast radius is "the gold tables the agent already could read." It cannot escalate to silver, bronze, or sibling agents' data.

Within a gold table, column-level security via Data Catalog policy tags can hide individual columns even from accounts with table read access:

gold.billing_agent_context columns:
  customer_id     [public]
  email           [pii:contact]      ← policy tag
  plan_name       [public]
  spend_last_90d  [public]
  payment_method  [pii:financial]    ← policy tag

The agent's service account is granted the pii:contact tag (so it can quote emails) but not pii:financial (so it can never see payment methods). SELECT * from the agent's connection automatically excludes the protected columns.

Why It Matters

Prompt injection becomes a contained risk. A malicious user prompt can manipulate the agent's reasoning, but it can't widen the IAM grants. The data the agent can access is bounded by IAM, not by prompt rules.
Tenant isolation in multi-tenant SaaS. A query template that forgets to add WHERE tenant_id = ? is a critical bug. Combined with row-level security policies, you make tenant cross-pollination impossible at the storage layer.
Audit logs become useful. Cloud Audit Logs record every BigQuery query with the calling service account. "Which queries touched table X in the last hour?" is a single log filter.
Rotation is mechanical. Service-account keys (when used) rotate on schedule. Workload Identity (preferred) means no keys at all — credentials come from the runtime environment.

Key Technical Details

Use Workload Identity Federation instead of long-lived service-account keys whenever possible. The agent's Cloud Run service authenticates as the SA without a key file ever existing.
BigQuery row-level security policies filter rows transparently. Define them at the table level: CREATE ROW ACCESS POLICY tenant_isolation ON gold.foo GRANT TO ('group:tenant-a@myco.com') FILTER USING (tenant_id = 'a').
For high-regulation workloads, VPC Service Controls add a network perimeter — even with valid credentials, BigQuery queries from outside the perimeter are blocked. Significant operational overhead; only worth it where compliance demands.
Audit data access logs (off by default) once per service. They're necessary for SOC 2 and most security reviews.

Common Misconceptions

"IAM at the project level is fine." It's the most common bug. Project-level grants give too much access for the convenience saved. Spend the extra 10 minutes to scope at dataset or table level.

"The agent runs as a user, so it has the user's permissions." That's one pattern (impersonation), and it's appropriate sometimes — a customer-support agent might act as the support rep. But the default should be a dedicated service account; impersonation needs explicit thought.

"PII filtering happens at retrieval." It can, but defense in depth means PII is also masked at the storage layer (policy tags), and audited at the log layer. Three lines of defense, not one.

Connections to Other Concepts

Data Governance From Day One — Labels and ownership feed into IAM grants.
Handling PII and Redaction Pipelines — Active redaction on top of static IAM.
The Retrieval Contract Between Pipeline and Agent — The contract enforces what the agent can ask for; IAM enforces what it can actually receive.