Memory Benchmarks
This page defines how smarna evaluates memory quality for long-running AI agents and MCP server environments.
Metrics
- Time-to-degradation under sustained interactions
- Token efficiency vs. baseline retrieval approaches
- Recall accuracy for old and recent facts
- Latency stability across long horizons
Evaluation setup
We compare memory quality across equivalent workloads with controlled token budgets and fixed evaluation prompts.