Skip to main content
Logo
Overview

AI Agent Memory in 2026: Mem0 vs Zep vs Letta vs Cognee

June 12, 2026
9 min read

Every agent project hits the same wall around week three. The demo works, the agent answers questions, everyone’s happy — and then someone closes the chat, opens a new one, and the agent has no idea who they are. It forgot the user’s name, their preferences, the decision you made together yesterday. A 200K context window doesn’t fix this. Context is short-term recall that vanishes when the session ends. Memory is the part that survives.

So you go looking for a memory layer, and within an hour you’ve got four browser tabs open: Mem0, Zep, Letta, Cognee. They all promise persistent memory for agents. They are not the same thing, and picking the wrong one means rebuilding your data layer six months in. I’ve wired up all four. Here’s how they actually differ and which one I’d reach for depending on what you’re building.

What a “memory layer” actually does

Strip away the marketing and a memory layer does three jobs. It decides what’s worth keeping from a conversation (extraction), it stores those facts somewhere durable, and it pulls the relevant ones back into context when they matter (retrieval). The hard part isn’t storage — any vector DB can hold embeddings. The hard part is extraction and retrieval quality: knowing that “actually, make it dairy-free” should overwrite the earlier “add extra cheese,” and surfacing that fact two weeks later when the user orders again.

It helps to name the kinds of memory, because the four tools optimize for different ones:

  • Working memory — what’s in the context window right now. Everyone has this; it’s just the prompt.
  • Episodic — specific past events. “On June 3rd the user said the deploy failed.”
  • Semantic — distilled facts about the world or user. “The user prefers Postgres over MySQL.”
  • Temporal — facts that change over time, with the when attached. “The user worked at Stripe (2023–2025), now at a startup.”

That last one is where most naive setups fall apart. If you just dump facts into a vector store, you end up with “user works at Stripe” and “user works at a startup” both ranking high, and the agent picks one at random. Keep that distinction in mind — it’s the axis that separates these four tools more than anything else.

Mem0 — the default pick for most people

Mem0 is the one I’d start with if you have no strong reason to do otherwise. It’s a drop-in API: you send it conversation turns, it figures out what to remember, and you query it with natural language to get relevant memories back. Under the hood it runs a hybrid of vector search, a graph store, and key-value, but you don’t have to think about any of that to get started. Two function calls — add() and search() — and you have a working memory layer.

It’s earned the default slot partly on benchmarks. Mem0 reports roughly a 26% accuracy improvement over a baseline OpenAI memory setup on the LOCOMO long-conversation benchmark, alongside lower latency and token use because it retrieves a focused slice instead of replaying whole histories. Self-reported vendor benchmarks always deserve a side-eye, but the architecture backs up the claim: smaller, more relevant retrieval means fewer tokens and faster responses.

The pricing is where you need to read the fine print. As of June 2026 the tiers run: Hobby (free, 10K memories, 1K retrievals/month), Starter at $19/month (50K memories), Pro at $249/month, and custom Enterprise. The catch is graph memory — entity extraction, relationship mapping, multi-hop queries — is gated behind the Pro tier. That’s a 13× jump from $19 to $249 the moment you need the feature that makes the tool architecturally interesting. For simple user-preference storage you’ll never hit that wall. For anything relationship-heavy, budget for Pro or self-host the open-source version and run the graph yourself.

Pick Mem0 when you’re building personalization — chatbots, assistants, support agents that should remember a user across sessions. It’s the path of least resistance and the one with the largest community when you get stuck.

Zep — when when matters

Zep solves the temporal problem head-on. Its engine, Graphiti, is a temporally-aware knowledge graph: every fact gets a timestamp and a validity window. So when the user changes jobs, Zep doesn’t overwrite the old fact and doesn’t leave two contradictory ones floating — it marks the old one as no longer valid and records when the new one took over. Ask “where does the user work?” and you get the current answer; ask about last year and the history is still there.

This sounds like a niche feature until you build anything that tracks state over time. Account status, subscription tiers, project phases, a patient’s symptoms, a deal moving through a pipeline — all of it is temporal. An agent that can’t reason about when a fact was true will confidently tell a customer their cancelled plan is still active. Zep is built to not do that.

Pricing runs on credits. The Flex plan starts at $25/month for 20,000 credits and scales to Flex Plus at $475/month for 300,000. The Graphiti engine underneath is open source, so you can self-host the graph and skip the SaaS bill — but you take on running it, and the temporal-graph machinery is more involved to operate than a vector store. The credit model also makes cost forecasting fuzzier than Mem0’s flat memory counts; you’ll want to run a realistic week of traffic before trusting your estimate.

Reach for Zep when temporal correctness isn’t optional — anything where a stale fact causes a wrong action rather than just a clumsy answer.

Letta — memory as an operating system

Letta (the project formerly known as MemGPT) comes at this from a completely different angle, and it’s the most conceptually interesting of the four. Instead of being a memory service your agent calls, Letta is the agent framework, and it treats memory like an OS treats RAM and disk. Core memory lives in the context window and the agent edits it directly. Recall memory is searchable conversation history sitting just outside context, like a disk cache. Archival memory is cold storage the agent queries through tool calls. The agent itself decides what to promote into context and what to page out.

The payoff is agents that genuinely manage their own state. Give a Letta agent a memory block describing the user, and over a long conversation it’ll rewrite that block as it learns new things — no separate extraction pipeline, the agent does it inline. The 2026 addition is sleep-time compute: a background agent that refines memory while the main agent is idle, instead of stalling a live response to reorganize what it knows. Memory cleanup happens asynchronously, so the user-facing latency stays low and the memory quality keeps improving between turns.

It’s fully open source and self-hostable, which is the main reason teams with data-residency or compliance constraints land here. The managed cloud has a free tier with paid plans from $20/month; API usage is metered at $0.00015 per second of tool execution, and a modest self-hosted deployment runs maybe $5–10/month in infrastructure. Enterprise is a sales conversation.

The trade-off is that Letta asks you to adopt its agent model, not just bolt memory onto an agent you already wrote. If you’ve already built on LangGraph or the Claude Agent SDK and just want memory, that’s a real migration cost. But if you’re starting fresh and want long-running agents that learn — the kind of thing you’d leave running for weeks — Letta’s design is the one built for exactly that.

Cognee — for graphs over messy documents

Cognee is the odd one out, and it’s aimed less at “remember what the user said” and more at “build a structured brain from a pile of documents.” Its ECL pipeline — Extract, Cognify, Load — ingests data from 38+ sources, pulls out entities and relationships, and assembles a knowledge graph with embeddings layered on top. It unifies relational, vector, and graph storage into one engine, so you’re not stitching three databases together yourself. A memify layer then tunes the graph over time, feeding rated responses back into edge weights so retrieval sharpens with use.

This shines when your “memory” is really a corpus. Research notes, policy PDFs, internal wikis, scientific literature — unstructured stuff where the value is in the relationships between entities, not in a tidy log of chat turns. The real-world deployments reflect that: Bayer has used it for scientific research workflows, and the University of Wyoming built an evidence graph from scattered policy documents with page-level provenance. That provenance angle matters for any domain where you need to cite which document a fact came from.

Cognee claims benchmark leadership on LongMemEval, LoCoMo, and ConvoMem — but as of late 2025 those numbers are self-reported and haven’t been independently verified, so weight them accordingly. The company raised a $7.5M seed to keep building, and it plugs into the Claude Agent SDK, OpenAI Agents SDK, and LangGraph, so it’s not a walled garden. Just know you’re adopting a knowledge-graph pipeline, which is more machinery than a personalization layer needs. For a chatbot that should remember a user’s name, Cognee is overkill. For an agent that has to reason over a thousand documents, it’s the right shape.

A few that didn’t get their own section

The four above aren’t the whole field. LangMem is LangChain’s own memory toolkit — convenient if your stack already lives in LangChain, less compelling otherwise. Supermemory targets a more consumer/personal-knowledge angle. And there’s a steady stream of newer entrants benchmarking themselves against Mem0 every month. None of them change the basic decision below; they’re variations on the same four archetypes.

So which one?

The honest answer is that these tools barely compete, because they’re solving different shapes of the problem. Here’s how I’d actually decide:

  • Building personalization — a chatbot or assistant that remembers users across sessions? Mem0. Easiest integration, biggest community, and the free tier covers a real MVP. Only graduate to Pro when you genuinely need the graph.
  • Tracking facts that change over time — subscriptions, account state, anything where a stale fact triggers a wrong action? Zep. The temporal graph is the whole point and nobody else does it as cleanly.
  • Long-running agents that should learn and self-improve, especially with compliance pushing you to self-host? Letta. You adopt its agent model, but you get autonomous memory management and sleep-time refinement.
  • Reasoning over a large body of documents where relationships and provenance matter? Cognee. It’s a knowledge-graph pipeline, not a chat-memory store, and that’s exactly when you want it.

One caution that applies to all four: benchmark numbers in this space are overwhelmingly self-published, and the LOCOMO/LongMemEval leaderboards shift with every release. Don’t pick on a vendor’s accuracy chart. Pick on architecture fit, then run your own eval on a week of realistic traffic before you commit — retrieval quality on your data is the only number that matters.

If you’ve got an afternoon, wire Mem0’s free tier into whatever you’re building and watch what it chooses to remember versus what you’d have kept. That gap — between what the tool extracts and what you actually care about — tells you more about whether you need something heavier than any comparison table will.