Elixir agents, on-call bots, and AI that actually ships code
TLDR — Elixir gets a serious agent framework, on-call pages go to bots before humans, and scheduling finally admits it was a constraint solver all along.
If you feel like every tedious part of your job is quietly being wrapped in an agent, you are not hallucinating. Today is very much "everything is an agent" day, from BEAM-native frameworks to on-call responders and calendar whisperers.
As of 2026-03-07.
Key Signal ☕
Jido 2.0 turns the BEAM into an agent supercluster
Elixir fans just got a first-class excuse to ship agents into production.
Jido 2.0 is an Elixir-based agent framework that runs on the BEAM VM, built for production workloads rather than weekend demos. It ships tool calling, agent skills, multi-agent orchestration across distributed BEAM processes, and leverages OTP supervision for reliability. It also bakes in multiple reasoning strategies like ReAct, chain of thought, and tree of thought, plus workflow and durability features.
If you already run Elixir in prod, this is a direct path to agents with real supervision trees, restarts, and clustering instead of fragile Python sidecars. If you do not, Jido is still a strong reference architecture for what a production-hardened agent runtime should look like.
Expect to see more teams treat agents as long-lived, supervised processes rather than stateless function calls.
Steadwing promises to take the 2 a.m. page before you do
Your future on-call rotation might start with a bot triaging the blast radius.
Steadwing pitches itself as an autonomous on-call engineer that consumes alerts, correlates evidence across your stack, and attempts remediation. It hooks into the usual suspects like metrics, logs, version control, and deployment systems, then uses agents to diagnose and sometimes resolve incidents before escalating to a human. There is a demo mode plus a credit-card-free signup.
For AI engineers, it is another concrete example of agents with real authority in production, not just copilots suggesting runbooks. The hard problems are safe action spaces, auditability, and rollback.
If you explore tools like this, start with read-only integrations, then progressively expand the action surface with strict guardrails and postmortems.
Vela treats scheduling as the constraint solver it always was
Your calendar chaos now has a type: multi-constraint search problem.
In a Launch HN post, Vela (YC W26) describes AI agents that handle multi-party, multi-channel scheduling across email, chat, and changing constraints. They treat the problem as constraint satisfaction with unstructured natural language inputs, shifting requirements mid-solve, and social dynamics baked into the objective function. The agents coordinate across channels to reach workable times without you mediating every thread.
If you design agents for ops, sales, or support, this is a useful blueprint for combining LLMs with explicit constraint solvers and state tracking instead of pure text prediction.
Watch how they handle edge cases like partial availability, last-minute changes, and conflicting power dynamics inside orgs.
Worth Reading 📚
llmfit benchmarks what actually runs on your hardware
llmfit provides a single command to probe hundreds of models and providers against your local hardware. It reports what fits, how fast it runs, and what tradeoffs you face. Instead of guessing from blog posts, you get concrete viability data for your deployment targets.
picolm squeezes a 1B LLM into 256 MB RAM
picolm runs a 1-billion parameter model on a 10 dollar board with only 256 MB of RAM, via aggressive quantization and hardware-friendly kernels. It targets ARM and RISC-V devices and lower-end SBCs like Raspberry Pi. This is a strong signal that "tiny edge agents" are not just marketing decks.
Balyasny and OpenAI detail an AI research engine for investing
OpenAI profiles how Balyasny Asset Management built an AI research stack using GPT-5.4, automated evaluation, and agent workflows for investment analysis. The system orchestrates research tasks at scale while enforcing quality gates through evals and review loops. If you are building internal research agents, this is a rare look at an enterprise-grade pipeline.
AWS shows how to plug SageMaker LLMs into Strands agents
An AWS ML Blog post walks through using Llama 3.1 with SGLang on SageMaker while integrating with Strands agents. They build custom model parsers to adapt non-Bedrock endpoints into the Bedrock Messages API format. This is practical glue code for teams standardizing on Strands but running custom models.
On the Radar 🔎
zeroclaw: autonomous AI assistant infrastructure you can deploy anywhere
Fast, small agentic infrastructure with pluggable components and OpenClaw support, focused on fully autonomous assistants.
Agentic manual testing practices for coding agents
Simon Willison outlines patterns for safely testing coding agents by executing generated code rather than trusting it.
Multi-developer CI/CD for Amazon Lex bots
AWS details a pipeline pattern so multiple developers can build Lex conversational agents with isolated environments and automated tests.
OpenAI’s Codex Security agent enters research preview
Codex Security analyzes codebases to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
Descript scales multilingual dubbing with OpenAI models
Descript shares how it uses OpenAI to synchronize translated speech with timing constraints for natural-sounding dubbed video.
Google’s Workspace CLI plugs OpenClaw into your docs and calendar
Ars Technica covers Google’s new command-line tool that exposes Workspace data and 40 plus skills to agentic systems via structured JSON.
Prompt injection chain compromises Cline’s production releases
Simon Willison links to Adnan Khan’s report on "Clinejection" where a poisoned GitHub issue title hijacked an AI-powered issue triage workflow.
DeepRare beats specialists on rare disease diagnosis
The Next Web reports that DeepRare, an agentic system using 40 tools, outperformed doctors in a Nature study on rare disease diagnosis.
New Tools & Repos 🧰
- llmfit — 12.3k stars. One-command probe to benchmark which LLMs and providers run on your hardware.
- picolm — 1.3k stars. Runs a 1B parameter model on 256 MB RAM edge boards using quantization.
- zeroclaw — 24.2k stars. Autonomous AI assistant infrastructure with OpenClaw support, deployable across environments.
Topic Trends
Top recurring themes across today’s items:
- LLM and OpenClaw ecosystems are converging on shared abstractions for tools and skills.
- AI agents in production now target ops, security, scheduling, and finance workflows.
- Multi-agent patterns and supervision trees are showing up in real frameworks, not just papers.
- Security and prompt injection are front and center for any automated workflow.
Key Takeaways
- Use mature runtimes like Jido or Zeroclaw instead of hand-rolled agent orchestration.
- Start ops agents in read-only mode before granting remediation powers.
- Model scheduling and similar workflows as constraint problems plus stateful agents.
- Execute and sandbox all code from coding agents; never trust static review alone.
- Design every agent integration with prompt injection and abuse cases in mind.
Key Takeaways
- Use production-grade agent frameworks with supervision trees instead of ad-hoc scripts
- Autonomous ops agents must plug into your real observability, ticketing, and deploy stack
- Treat scheduling and workflow problems as constraint solvers, not email threads
- Continuously test and sandbox coding agents; never trust unexecuted generations
- Defend agents against prompt injection in every toolchain and workflow