Token Diets, Talking Agents, and Smarter Memories
For engineers, designers & product people. Stay up to date with free daily digest.
TLDR: Today is about cutting token burn, fixing tool calls, and giving your agents decent voices without lighting your GPU bill on fire.
If your infra bill has been creeping up since you went all-in on MCP and tool use, this one is for you. We have a neat CLI pattern for on-demand tools, a framework release that quietly fixes a lot of annoying tool-call flakiness, and an open TTS stack that can finally keep up with your agents.
Key Signal
mcp2cli slashes MCP token overhead with on-demand tool discovery
Turns out your “tools everywhere” setup might be a very expensive security blanket.
What happened: The new mcp2cli project hit Hacker News with a blunt stat: a typical Model Context Protocol (MCP) setup injects full tool schemas into context on every turn. Thirty tools cost about 3,600 tokens per turn, whether the model calls them or not. Over 25 turns with 120 tools, that is roughly 362,000 tokens burned just on schemas.
mcp2cli instead turns any MCP server or OpenAPI spec into a command line interface at runtime. The language model discovers tools on demand through cheap metadata calls, for example listing tools at around 16 tokens per tool and using --help commands that cost roughly 120 tokens. No more eager stuffing of the entire tool universe into every prompt.
Why it matters: If you are building serious agents, you are almost certainly overpaying for tool visibility. Eager schema injection scales cost superlinearly as you add tools, sessions, and users. For you, this means it is time to treat tool discovery as a protocol, not a prompt template.
What to watch / what to do: Expect this on-demand pattern to show up in MCP clients and agent frameworks. You should profile your current tool token spend and prototype a discovery step before full schema injection.
Mastra 1.10.0 ships input examples for tools and better MCP hooks
Your tools have been crying out for unit tests; Mastra gave them examples instead.
What happened: Mastra released @mastra/[email protected] with two notable changes for agent builders. First: tool definitions can now include inputExamples. These get passed through to models that support example-guided tool calling, such as Anthropic models with input_examples, so the model sees concrete, valid inputs next to schema definitions.
Second: the @mastra/mcp package now pipes a RequestContext into custom fetch hooks for MCP HTTP servers. That means you can forward auth headers, cookies, or other per-request metadata when your agents call MCP tools, instead of hacking global state.
Why it matters: Malformed tool calls quietly kill reliability in production agents. Example-guided tools reduce that by giving the model a few canonical shapes to imitate, especially for complex nested inputs. For you, this means you can finally encode the “golden path” for each tool directly in the schema instead of relying on docs the model never reads.
What to watch / what to do: You should start adding inputExamples to your highest-value tools first, then monitor malformed call rates and latencies across models that support examples.
OpenMOSS releases MOSS‑TTS, an open expressive speech and sound family
Your agents no longer have to sound like a 2014 navigation app.
What happened: The OpenMOSS team and MOSI.AI released MOSS‑TTS, an open-source family of text-to-speech and sound generation models that already has more than 800 GitHub stars. The models target high fidelity and high expressiveness across several hard modes: long-form stable speech, multi-speaker dialogue, character and voice design, environmental sound effects, and real-time streaming TTS.
The project positions itself as an end-to-end audio stack for “complex real-world scenarios,” which in practice means call centers, in-game agents, and any workflow where you need different characters and emotions, not just a generic voice.
Why it matters: Voice is quickly becoming the default interface for many agents, but proprietary TTS can be costly or restrictive for custom voices, on-prem use, or edge deployment. For you, this means you can experiment with expressive, multi-speaker, and streaming voices without handing the entire stack to a single vendor.
What to watch / what to do: You should evaluate MOSS‑TTS quality versus your current TTS on your own target accents, languages, and device constraints before committing it to user-facing flows.
Worth Reading 📚
Hybrid LanceDB memory plugin levels up OpenClaw agent retrieval
memory-lancedb-pro is a LanceDB-based memory plugin for OpenClaw that combines vector search with BM25 keyword search, cross-encoder reranking, and multi-scope isolation. It also ships a management CLI so you can inspect and curate memory directly.
So what: If your agent “memory” feels like a junk drawer, you should study this hybrid pattern for your own retrieval-augmented generation (RAG) stack.
X2-AQFormer brings interpretable air quality forecasting with attention maps
A Nature paper on X2-AQFormer details a transformer for multi-day, hourly air pollution forecasting that exposes which temporal and feature segments drive each prediction. The model uses attention weights and feed-forward networks to create an explicit mapping from input patches to forecasted outputs.
So what: Interpretability techniques like these can inspire how you design transparent agent decision logs, especially for regulated environments.
Karpathy’s “autoresearch” agent loops on its own code improvements
Quantum Zeitgeist covers Andrej Karpathy’s recently released “autoresearch” repository, a minimal three-file system where an AI agent iteratively designs experiments, runs them, and refines its own code. The piece stresses how this brings “autonomous research” closer to practice, not just theory.
So what: If you are experimenting with self-improving agents, you should compare your loop design and safety checks with autoresearch’s simplicity.
India’s sovereign LLM needs better evaluations, not just more GPUs
Forbes argues India can train an Indic sovereign model but still lacks a credible way to prove its quality. The article calls for an independent, well-funded evaluation body to benchmark models across 22 scheduled languages using suites like MILU, PARIKSHA, and Indic LLM-Arena.
So what: If you serve multilingual markets, you should not trust generic benchmarks; you need domain and language-specific evals that look more like this.
On the Radar 👀
Ask HN: Please restrict new accounts from posting
Hacker News users debate rising AI-generated content from new accounts and propose restrictions or filters to keep the site from turning into a low-signal bot farm.
New Research Reassesses the Value of Agents.md Files for AI Coding
InfoQ reviews new research on how much agents actually benefit from large agents.md context files when writing code, and when these files just add noise.
Simon Willison highlights Bruce Schneier and Nathan E. Sanders’ analysis of Pentagon contracts with OpenAI and Anthropic, and what commoditized model performance means for vendor choice.
New Tools & Repos 🧰
mcp2cli
CLI that turns any MCP server or OpenAPI spec into an on-demand tool interface, reducing token usage versus eager schema injection.@mastra/[email protected]
Mastra release addinginputExamplesfor tool-call accuracy and request-scopedRequestContextfor MCP HTTP clients.MOSS-TTS
Open-source family of high-fidelity, expressive TTS and sound generation models for long-form, multi-speaker, character voices, and streaming use.memory-lancedb-pro
Enhanced LanceDB-based memory plugin for OpenClaw with hybrid vector and BM25 retrieval, cross-encoder reranking, multi-scope isolation, and a management CLI.
As of 2026-03-09 for all evolving technical claims and adoption trends above.
Key Takeaways
- MCP tool schemas can silently burn hundreds of thousands of tokens in longer agent sessions
- mcp2cli shows a pattern for on-demand tool discovery instead of eager schema stuffing
- Mastra now supports input examples to cut malformed tool calls in production
- MOSS-TTS offers an open, expressive TTS stack for multi-speaker, long-form and real-time use
- Hybrid vector plus BM25 plus rerank is becoming the default pattern for serious agent memory
More from the Digest
For engineers, designers & product people. Stay up to date with free daily digest.