Embeddings, locally
nomic-embed-text turns a string into a 768-dimension vector. We store it as a packed float32 blob in the embedding column — no extension, no vector DB, just bytes in SQLite.
Add to memory.py:
import struct, math
import ollama
EMBED_MODEL = "nomic-embed-text"
def embed(text: str) -> list[float]:
return ollama.embeddings(model=EMBED_MODEL, prompt=text)["embedding"]
def to_blob(vec: list[float]) -> bytes:
return struct.pack(f"{len(vec)}f", *vec)
def from_blob(blob: bytes) -> list[float]:
return list(struct.unpack(f"{len(blob) // 4}f", blob))
def cosine(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
na = math.sqrt(sum(x * x for x in a))
nb = math.sqrt(sum(y * y for y in b))
return dot / (na * nb + 1e-9)At a few hundred facts, a linear scan in Python is fast enough — microseconds. This is the honest scale of personal-agent memory. (When you outgrow it — tens of thousands of facts — swap the scan for sqlite-vec or pgvector; the interface below doesn't change.)
Recall — but only what's true now
This is the function that makes our memory different from transcript search. It filters to valid_to IS NULL before ranking, so a retired fact can never be returned:
def recall(con, query: str, k: int = 5) -> list[tuple[str, str, str]]:
q = embed(query)
rows = con.execute(
"SELECT subject, predicate, value, embedding"
" FROM facts WHERE valid_to IS NULL"
).fetchall()
scored = [
(cosine(q, from_blob(emb)), s, p, v)
for (s, p, v, emb) in rows
if emb is not None
]
scored.sort(reverse=True)
return [(s, p, v) for (_, s, p, v) in scored[:k]]That WHERE valid_to IS NULL is the entire fix for the update failure. Naive retrieval over raw turns ranks every past mention by similarity; both "I live in Toronto" and "I moved to Berlin" match "where do I live?", and the stale one can win. Here, the Toronto row is already closed by the time we recall, so it isn't even a candidate.
Try it
if __name__ == "__main__":
con = connect()
# (pretend these were written by the pipeline in Step 4)
for pred, val in [("role", "backend engineer"), ("city", "Berlin")]:
emb = to_blob(embed(f"user {pred} {val}"))
con.execute(
"INSERT INTO facts(subject,predicate,value,embedding,valid_from,valid_to,session)"
" VALUES('user',?,?,?,'2026-01-01T00:00:00',NULL,1)",
(pred, val, emb),
)
con.commit()
print(recall(con, "where does the user live?", k=2))$ python memory.py
[('user', 'city', 'Berlin'), ('user', 'role', 'backend engineer')]The city fact ranks first for a location query, exactly as you'd hope. Next we make the writes real — and temporal.
Reference: Ollama embeddings API · nomic-embed-text · sqlite-vec (when you outgrow the scan)