Two memory layers, not one

The fact store answers precise questions ("what's the user's city?"). But some things are fuzzy — the shape of a long conversation, an ongoing decision, the tone the user prefers in practice. That's what a rolling summary is for. It's the cheapest memory layer from the drip's lab: durable and small, but lossy as it's re-summarized.

We run both. Facts for the sharp, updatable stuff; a summary for the ambient context.

Summarize a session

SUMMARIZE_PROMPT = """Summarize this session in 2-3 sentences.
Preserve any durable facts, decisions, and the user's stated preferences.
Be concise; drop small talk.
 
Session transcript:
{turns}"""
 
def summarize_session(con, session: int, turns: list[str]) -> str:
    transcript = "\n".join(turns)
    r = ollama.chat(
        model=CHAT_MODEL,
        messages=[{"role": "user",
                   "content": SUMMARIZE_PROMPT.format(turns=transcript)}],
    )
    summary = r["message"]["content"].strip()
    con.execute(
        "INSERT OR REPLACE INTO summaries(session, text) VALUES(?, ?)",
        (session, summary),
    )
    con.commit()
    return summary
 
def recent_summaries(con, limit: int = 3) -> str:
    rows = con.execute(
        "SELECT text FROM summaries ORDER BY session DESC LIMIT ?", (limit,)
    ).fetchall()
    return "\n".join(t for (t,) in reversed(rows))

INSERT OR REPLACE keys on session, so re-summarizing a session overwrites its summary rather than duplicating it.

The erosion is real — and fine

Re-summarizing loses detail; that's the drip's "rolling summary decays" result. The mitigation isn't to fight it — it's to not rely on the summary for anything sharp. Anything that must stay exact (the city, a constraint) lives in the fact table, which doesn't erode. The summary only carries what's tolerable-if-fuzzy. Split the job and each layer plays to its strength.

Try it

if __name__ == "__main__":
    con = connect()
    turns = [
        "User: I'm Maya, a backend engineer.",
        "Assistant: Nice to meet you, Maya.",
        "User: I'm leading the billing rewrite and I prefer terse answers.",
    ]
    print(summarize_session(con, session=1, turns=turns))
$ python memory.py
Maya is a backend engineer leading the billing rewrite. She prefers terse,
to-the-point answers.

Now we have both layers. Step 6 wires them around an actual agent turn.


Reference: Ollama chat API · Agent Long-Term Memory — the session-15 lab · Agentic Context Engineering (drip)