Silver to Gold: Modeling for Agents

What's the Concept?

Silver tables are general-purpose. They represent the truth about an entity. Gold tables are opinionated — they represent the answer to a specific question the agent will ask.

An agent that helps customers troubleshoot billing has different gold needs than one that helps support engineers triage tickets. Both might read from the same silver customers, orders, and tickets tables. But their gold tables are completely different shapes.

The discipline: one gold table per agent use case, joined and shaped to that use case alone. Don't try to build a universal gold table; that's what silver is for.

How It Works

A gold table for a "billing support agent" might look like:

CREATE OR REPLACE TABLE `myco.gold.billing_agent_context`
PARTITION BY DATE(last_activity_at)
CLUSTER BY customer_id
AS
SELECT
  c.customer_id,
  c.email,
  c.plan_name,
  c.signup_date,
 
  -- Recent billing activity (last 90 days)
  ARRAY_AGG(STRUCT(
    o.order_id,
    o.amount_cents,
    o.currency,
    o.status,
    o.created_at
  ) ORDER BY o.created_at DESC LIMIT 10)        AS recent_orders,
 
  -- Pre-computed summary the agent can quote
  SUM(IF(o.created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY),
         o.amount_cents, 0))                    AS spend_last_90d_cents,
  SUM(IF(o.status = 'refunded', o.amount_cents, 0))
                                                 AS total_refunded_cents,
 
  -- Most recent open issue, if any
  ANY_VALUE(
    IF(t.status = 'open', STRUCT(t.ticket_id, t.subject, t.created_at), NULL)
    HAVING MAX t.created_at
  )                                              AS open_issue,
 
  MAX(GREATEST(o.created_at, t.created_at))      AS last_activity_at,
  CURRENT_TIMESTAMP()                            AS _refreshed_at
FROM `myco.silver.customers` c
LEFT JOIN `myco.silver.orders` o   USING (customer_id)
LEFT JOIN `myco.silver.tickets` t  USING (customer_id)
GROUP BY c.customer_id, c.email, c.plan_name, c.signup_date;

The properties of a good gold table for an agent:

One row per natural lookup key. Here it's customer_id — the agent will always know that. If the agent has to ask "which row?" the table is wrong.
The shape mirrors the question. Billing agents need recent orders + spend total + open issue. That's what's pre-computed and embedded.
Arrays / structs are okay. BigQuery handles nested types natively, and the agent's tool can return them as nested JSON. Don't over-flatten.
Bounded by design. The LIMIT 10 on recent orders ensures the tool response fits in the agent's context budget, no matter the customer.
Provenance preserved. _refreshed_at tells the agent how stale the row is (and lets retrieval logic decide whether to fall back to live data).

Why It Matters

Retrieval tools become trivial. The agent's get_billing_context(customer_id) tool is a one-line SELECT * FROM gold.billing_agent_context WHERE customer_id = ?. No joins at query time.
Context windows stay under control. Pre-aggregation enforces the token budget at pipeline time, not retrieval time.
The agent's answer becomes verifiable. Because gold rows are derived deterministically from silver, you can always reproduce exactly what the agent saw at any past timestamp.

Key Technical Details

Refresh cadence depends on the use case. A billing agent's gold table can refresh hourly; a real-time ops agent's might need streaming updates via incremental MERGE.
Keep gold tables under ~5 GB when you can. Smaller means cheaper queries, faster refreshes, and the option to keep the whole thing in BI Engine cache.
Don't put embeddings in this table — they belong in a sibling table or as a separate column (we'll cover both in Module 05). Mixing makes refresh harder.
The _refreshed_at column is non-negotiable. Tool calls should surface staleness explicitly to the agent.

Common Misconceptions

"One gold table per entity." Tempting and wrong. One per use case. The same customer can appear in three different gold tables shaped for three different agents.

"Gold should be the same for analytics and for agents." It usually shouldn't. Analyst dashboards want wide, flexible tables for ad-hoc slicing. Agent tools want narrow, opinionated tables for one specific question. Build both, on top of the same silver.

"Materializing is expensive." Storage in BigQuery is $0.02/ GB - m o n t h . A 5 GB g o l d t ab l ecos t sy o u$ 0.10/month to keep. The expensive thing is querying it badly, which materialization prevents.

Connections to Other Concepts

04-incremental-and-idempotent-pipelines — How gold tables refresh efficiently as silver changes.
Course 05-serving-data-to-agents/01-structured-retrieval-bigquery-as-a-tool — Wiring gold into an agent's tool API.
03-dbt-for-versioned-transforms — Version-controlling these gold definitions.