What's the Concept?
When data engineering and AI engineering teams ship things separately, the friction lives at the boundary: an agent that worked yesterday breaks today because a column got renamed; a new agent capability is blocked because the data team doesn't know what shape it needs.
The fix is to make the boundary explicit. The set of tools the agent can call — their names, parameters, return shapes, performance and freshness guarantees — is a versioned interface between the pipeline and the agent. Both sides commit to it. Breaking changes go through deprecation; capabilities are added in versioned form.
How It Works
The contract has five parts, each worth writing down:
1. Tool inventory. A canonical list of tools the agent can call. For each: name, one-paragraph description, schema, owning team. Lives in a YAML file in the repo or a Confluence/Notion doc, but it's a source of truth — not informally maintained.
# agent_tools/billing_agent.yaml
agent_id: billing_agent
version: 1.4.0
owning_team: agents
tools:
- name: get_billing_context
description: Fetch a customer's billing summary.
input_schema: { ... }
output_schema:
type: object
properties:
customer:
type: object
properties: { customer_id, email, plan_name, spend_last_90d_cents, ... }
freshness_sla: 1 hour
cost_ceiling_bytes: 10_000_000
owning_team: platform-data
- name: search_similar_tickets
description: Find tickets similar to a query for this customer.
input_schema: { ... }
output_schema:
type: array
items: { ... }
freshness_sla: 24 hours
cost_ceiling_bytes: 50_000_000
owning_team: platform-data2. Freshness contract. Each tool declares how stale its data can be. "Refreshed hourly" vs. "near-real-time, sub-30-second lag" vs. "daily batch — latest available is yesterday's close-of-day." The agent's prompt can reference this, and the tool response should include a _refreshed_at timestamp.
3. Schema versioning. Output schemas are explicit. Adding fields is non-breaking; renaming or removing fields requires a new tool version. The deprecation policy is written: "v1.x is supported for 90 days after v2 ships."
4. Cost ceiling. Each tool declares a maximum bytes-billed per call. The implementation enforces it (BigQuery's maximum_bytes_billed). Agent-side rate limiting on top of that gives a cost-per-conversation bound.
5. Failure modes. What does each tool return when the lookup fails? Empty array? Null object? Error code? Pick a convention and stick to it. Surprising error shapes are the most common cause of "the agent went off the rails."
Why It Matters
- Coordination cost drops. With the contract written, the data team can change implementations without the agent team noticing. The agent team can build against the contract before all the data lands.
- Drift becomes detectable. Every release that touches a tool runs a contract test: "does the live response match the declared schema?" Schema drift in production is impossible by construction.
- Rollback is cheap. A bad new tool version can be hidden behind a feature flag and rolled back without touching the agent code. Just point the registry back at v1.
- New agents come up faster. A second agent that needs the same data uses the same tools. Tools are reusable across agents.
Key Technical Details
- The output schema is the most important part. If it's not strict, downstream parsing has to handle every variation. Use JSON Schema or Pydantic models; reject responses that don't validate before they reach the agent.
- Treat tool inventory as code. PRs that modify it require review from both the data and agent teams. CI runs the schema validation tests.
- Tool descriptions are part of the model's prompt — they're billed. Keep them tight but not ambiguous; "tight enough that the model knows when to call it, loose enough that the description still makes sense to a human."
- Always include
_refreshed_atand_sourcein the response. The agent can include them in its answer if confidence demands it.
Common Misconceptions
"We can iterate later." Tools without a contract become legacy in weeks. Each fix to one agent breaks another; each new agent re-invents fields. Write the YAML on day one — five lines saves a quarter.
"The model handles missing fields." It does, usually, but unpredictably. Defining the schema means the agent's behavior is predictable, which is the actual product property you care about.
"Tools are owned by the agent team." The implementation might be, but the data shape inside the tool is the data team's responsibility. Co-ownership, with the contract as the boundary, works best.
Connections to Other Concepts
- Module 06 — Pipeline orchestration uses the contract to schedule refreshes that meet the freshness SLA.
- Module 07 — Monitoring tracks compliance with the contract: schema validation rates, freshness lag, cost-per-call.
- Module 08 — The capstone implements a small but complete contract end-to-end.
Further Reading
- Chad Sanderson, Driving Data Quality with Data Contracts (Packt, 2023) — The book that made "data contracts" a first-class engineering term. The producer-consumer pattern with SLA + schema is exactly what this lesson applies to agent tools.
- Anthropic, "Building Effective Agents" (Dec 2024) — Frames the prompt-chaining / routing / orchestrator-worker patterns in terms of the tool interfaces they need. https://www.anthropic.com/research/building-effective-agents
- Anthropic, "Tool use with Claude" docs — Reference for tool-schema design; updated alongside model releases.
- OpenAI Function Calling and Structured Outputs docs — Equivalent on OpenAI's side; the contract pattern is identical.
- Vertex AI Agent Builder docs — Google's first-party agent framework; same tool-call mechanics.
- JSON Schema / Pydantic / OpenAPI — Pick whichever your stack already uses; the contract belongs in a schema language you can validate at runtime, not free prose.