The Goal

The companion drip, Agent Long-Term Memory, makes one argument: an agent forgets not because its context is full, but because nobody wrote anything down. This blueprint builds the thing that writes it down.

By the end you'll have a memory.py module and a tiny agent loop that:

  • Extracts durable facts from each user turn with a local LLM.
  • Stores them in SQLite as facts with validity intervals — so a new fact can retire an old one.
  • Recalls the relevant, currently-valid facts each turn using embedding similarity.
  • Summarizes each session so nothing important is lost when the transcript rolls off.

Everything runs offline through Ollama — no API keys, no cloud, no vector database. SQLite is the store; Ollama is both the embedder and the small model.

The test we're building toward

The whole build is validated by one scenario, straight from the drip:

  1. Session 1 — the user tells the agent six things about herself (name, role, city, project, a preference, a constraint).
  2. Session 8 — one fact changes: she moves from Toronto to Berlin.
  3. Session 15 — we ask two questions: what do you remember about me? and where do I live?

A naive memory (raw retrieval over past turns) passes the first and fails the second — it serves the stale "Toronto" because similarity search has no notion of which fact is current. Our store passes both, because the write path invalidates the old city instead of appending next to it.

Architecture

One file does the work (memory.py); a second wires it into an agent turn (agent.py); a third runs the session-15 test.

Why this stack

ChoiceWhy
SQLiteZero-setup, single-file, and it already ships with Python. Facts, embeddings (as blobs), and summaries live in one .db you can inspect with any SQLite browser.
OllamaLocal embeddings (nomic-embed-text) and a local small model (llama3.2) for extraction and summaries. No keys, no per-token bill, works on a plane.
Temporal factsThe one idea that separates a real memory from a transcript search: every fact has a valid_from / valid_to, so "moved to Berlin" closes the "lives in Toronto" row.

What's deliberately not here: a vector database, an embeddings API, a framework. You can add those later — but the whole point of the drip is that the hard part is the temporal write path, not the infrastructure.

The companion repo

Every step is captured in a runnable repo: github.com/maraja/give-your-agent-memory. Build it yourself from the blueprint, or clone and read the blueprint as commentary.

What's coming

Seven short steps:

  1. What we're building (you're here)
  2. The store — SQLite schema for temporal facts + summaries
  3. Embeddings & recall — embed with Ollama, search only valid facts
  4. The write path — extract facts and invalidate on change
  5. Rolling summaries — compress a session without losing durable facts
  6. The memory manager — wire write/select/compress/invalidate around an agent turn
  7. The session-15 test — prove recall and the update, then what's next

Reference: Ollama · nomic-embed-text · Agent Long-Term Memory (drip) · Run an Open Model with Ollama (blueprint)