Step 1: What We're Building — Give Your Agent Long-Term Memory

The Goal

The companion drip, Agent Long-Term Memory, makes one argument: an agent forgets not because its context is full, but because nobody wrote anything down. This blueprint builds the thing that writes it down.

By the end you'll have a memory.py module and a tiny agent loop that:

Extracts durable facts from each user turn with a local LLM.
Stores them in SQLite as facts with validity intervals — so a new fact can retire an old one.
Recalls the relevant, currently-valid facts each turn using embedding similarity.
Summarizes each session so nothing important is lost when the transcript rolls off.

Everything runs offline through Ollama — no API keys, no cloud, no vector database. SQLite is the store; Ollama is both the embedder and the small model.

The test we're building toward

The whole build is validated by one scenario, straight from the drip:

Session 1 — the user tells the agent six things about herself (name, role, city, project, a preference, a constraint).
Session 8 — one fact changes: she moves from Toronto to Berlin.
Session 15 — we ask two questions: what do you remember about me? and where do I live?

A naive memory (raw retrieval over past turns) passes the first and fails the second — it serves the stale "Toronto" because similarity search has no notion of which fact is current. Our store passes both, because the write path invalidates the old city instead of appending next to it.

Architecture

One file does the work (memory.py); a second wires it into an agent turn (agent.py); a third runs the session-15 test.

Why this stack

Choice	Why
SQLite	Zero-setup, single-file, and it already ships with Python. Facts, embeddings (as blobs), and summaries live in one `.db` you can inspect with any SQLite browser.
Ollama	Local embeddings (`nomic-embed-text`) and a local small model (`llama3.2`) for extraction and summaries. No keys, no per-token bill, works on a plane.
Temporal facts	The one idea that separates a real memory from a transcript search: every fact has a `valid_from` / `valid_to`, so "moved to Berlin" closes the "lives in Toronto" row.

What's deliberately not here: a vector database, an embeddings API, a framework. You can add those later — but the whole point of the drip is that the hard part is the temporal write path, not the infrastructure.

The companion repo

Every step is captured in a runnable repo: github.com/maraja/give-your-agent-memory. Build it yourself from the blueprint, or clone and read the blueprint as commentary.

What's coming

Seven short steps:

What we're building (you're here)
The store — SQLite schema for temporal facts + summaries
Embeddings & recall — embed with Ollama, search only valid facts
The write path — extract facts and invalidate on change
Rolling summaries — compress a session without losing durable facts
The memory manager — wire write/select/compress/invalidate around an agent turn
The session-15 test — prove recall and the update, then what's next

Reference: Ollama · nomic-embed-text · Agent Long-Term Memory (drip) · Run an Open Model with Ollama (blueprint)