Mastra ships runtime model routing and schema unification

TLDR: Mastra levels up agent ergonomics, AWS pushes disaggregated inference, and NVIDIA shows its hand on safer long running agents.

Mastra 1.11.0 adds dynamic model routing and schema unification

Mastra core 1.11.0 introduces dynamic model fallback arrays so agents can provide model functions that return full ModelWithRetries[] stacks, with context driven routing and maxRetries inheritance. The same release adds a Standard Schema layer that normalizes between Zod v3 and v4, AI SDK Schema, and JSON Schema, with helpers like toStandardSchema and standardSchemaToJSONSchema as of 2026-03-17.

For agent builders, this means you can encode routing logic in code instead of YAML sprawl: pick models by tier, region, or workload at runtime, with nested and async selection. The schema compatibility work is equally important if you are juggling tools, evaluators, and multi framework validation in one project.

If you are committing to Mastra, this is a strong signal that they want to be the “glue” for heterogeneous model stacks and schema tooling, not just another agent runtime.

AWS debuts disaggregated inference powered by llm d

Amazon Web Services introduced disaggregated inference on Amazon Web Services powered by llm d, combining disaggregated serving, intelligent request scheduling, and expert parallelism on Amazon SageMaker HyperPod and Amazon Elastic Kubernetes Service as of 2026-03-17. The blog walks through how to wire this up for large language model serving, with claims of improved inference performance, higher GPU utilization, and better operational efficiency.

If you are running bigger models on Amazon Web Services, this is essentially their answer to the modular, expert style serving stacks people have been building in house. You split compute heavy parts of the workload from routing and lightweight pieces, then let llm d coordinate experts across nodes. It is still early and numbers are marketing grade, but the architecture matters.

Practically, adopting this means deeper commitment to Amazon SageMaker HyperPod and Amazon Elastic Kubernetes Service primitives, so factor that into portability and vendor lock in plans.

NVIDIA OpenShell targets safer autonomous, self evolving agents

NVIDIA announced NVIDIA OpenShell and the NVIDIA Agent Toolkit, positioned as an open source runtime for autonomous AI agents, called claws, that can run long lived and self evolving workloads as of 2026-03-17. The stack builds on NVIDIA NeMoClaw with policy based privacy and security controls plus enterprise oriented deployment patterns.

If you are experimenting with production grade agents that keep state, learn over time, and touch sensitive data, this is the kind of reference architecture vendors will converge on. OpenShell focuses on isolation, policy enforcement, and monitoring rather than yet another orchestration DSL. Actual maturity will depend on how opinionated the policies are and how easy it is to integrate with your existing observability and identity stacks.

The big picture: NVIDIA is trying to define the baseline for “safe” autonomous agents on their hardware, so expect tighter integration with NVIDIA GPUs, NVIDIA networking, and NVIDIA cloud partners.

Introducing deploy cli The new langgraph deploy command in the langgraph-cli package lets you deploy agents directly from your terminal into LangSmith Deployment, which is useful if you already live in the LangChain ecosystem and want a lighter weight DevOps story.
Claude Code skills that build complete Godot games Godogen is a pipeline that takes a text prompt and produces a full Godot 4 game project, including 2D or 3D assets, GDScript, and visual tests. The Hacker News thread digs into how the author tackled data scarcity for GDScript and reliability issues.
Why Codex Security Doesn’t Include a SAST Report OpenAI explains why Codex Security favors AI driven constraint reasoning and validation instead of traditional static application security testing reports, aiming to cut false positives and focus on exploitable issues.
Nvidia's 'ChatGPT moment' for self driving cars, and other key AI announcements at GTC 2026 ZDNET summarizes NVIDIA GTC 2026 physical AI news, including Cosmos 3 for synthetic world generation and Isaac GR00T N1.7, an open reasoning vision language action model for humanoid robots.
mastra @mastra/[email protected] Mastra 1.13.0 adds Zod based schemas and in memory implementations for observability signals plus a new Turso or SQLite backed @mastra/agentfs workspace filesystem, which should help with persistent agent workspaces and type safe metrics.
AWS and NVIDIA deepen strategic collaboration to accelerate AI from pilot to production Amazon Web Services and NVIDIA are tightening integration across GPUs, networking, and managed services so if you are already on Amazon Web Services with NVIDIA hardware, expect smoother paths from prototypes to large scale training and inference.
Launch HN: Chamber (YC W26) – An AI Teammate for GPU Infrastructure Chamber is pitching an AI agent that manages GPU clusters for you, from provisioning to debugging failed jobs, built by ex Amazon GPU infrastructure engineers and currently being discussed on Hacker News.
Build an offline feature store using Amazon SageMaker Unified Studio and SageMaker Catalog Amazon Web Services shows how to stand up an offline feature store with a publish subscribe pattern so producers publish curated feature tables while consumers discover and reuse them for model development.
LinkedIn Uses New AI Models To Rebuild Feed Algorithm LinkedIn describes a unified retrieval system backed by new large language models and a GPU powered ranking model, meant to give creators more reach and keep feeds fresher.
How coding agents work Simon Willison outlines the internals of coding agents, treating them as harnesses around large language models wired up with tools, memory, and invisible prompts so you can reason more clearly about reliability.
Coding agents for data analysis Companion material for a NICAR 2026 workshop that shows data journalists how to use coding agents like Claude Code and OpenAI Codex for exploration, cleaning, and analysis workflows.
crewAI 1.11.0rc1 The latest release candidate adds a plan execute pattern and Plus API token auth for a2a enterprise and fixes a code interpreter sandbox escape bug, which is important if you rely on crewAI for multi agent workflows.

Mastra ships runtime model routing and schema unification

Mastra 1.11.0 adds dynamic model routing and schema unification

AWS debuts disaggregated inference powered by llm d

NVIDIA OpenShell targets safer autonomous, self evolving agents

Quick Hits

More from the Digest