
Anything turns English into production software: mobile apps, web, design, backend, data, payments, infra. Our agent already builds and ships apps end to end.
Max is the next level up: an autonomous engineer you spin up for max power.
You give Max a goal. It runs the app like a person, edits code and writes tests, and keeps iterating & trying the app until the change works. It solved 97% of the hardest problems we threw at it in early access.
Max doesn’t replace the base agent; it sits above it for moments that need more intelligence, more time, and more autonomy. It's also designed to help the people we care about serving most: those who couldn't otherwise debug the code.
Why we created Max
Inside the company we repeat a line: “Don’t just listen to users. Listen to the agent.”
When the Anything agent stalls, it uses a feedback tool to report the current project context and its diagnosis of why it can't move forward to the eng team. This small trick often gives us better bug reports than even early users, as the agent has context on the user's chat history, current project context, what its trying to do, the toolset, and the infra. We've shipped specific tools based on our agent's feedback of what tools would have helped it help users.
But in looking at user prompts before the agent gets stuck, a pattern emerged: a builder tries their app, sees something broken, and can’t describe the issue in enough detail or potential fixes for the agent to have a chance. Often the prompts here are "when I login, its broken" or "Something weird happens. Why can't you fix it?"
They shouldn’t have to be super specific. We want them focused on the product they want to create. Expecting every user to become a master in prompt engineering (or regular engineering) is unrealistic.
So instead we need an agent that can use the app like our builders to gain context and then plan & make changes.
Owning the runtime
We spent two years building the full stack inside Anything because code is meaningless until it runs. If you own where it runs, you can increase agent reliability.
Every app ships with Neon-backed Postgres, auth, serverless compute, payments, AI integrations, and a deployment surface we control end to end. When we rely on partners, it’s because they support agent-first workflows (Neon’s branching is a good example).
Owning the runtime lets our agents read logs, branch databases, patch infra, and verify changes. That visibility is the foundation of autonomy. It also just makes for a better user experience where everything a builder needs is 1 click away.
Intelligence ∝ time ∝ cost
For Max, we max out model intelligence, and let the agent interact with the app. Given the current state of browser automation tools, that does come at a latency and time cost.
Max intentionally embraces that tradeoff. If you're aiming to get customers for your app, it's worth having an on call always on engineer that can do work for you.
Launch-blocking bugs, new features, or just time speed ups go to Max. For founders replacing an agency or missing engineers, $200 is often worth it compared to downtime.
Max runs in the background so you reclaim time while it works.
Autonomous
We think about Max the way car companies talk about driving. An automatic transmission still needs a driver. Full self-driving senses the environment and makes decisions end to end. Max lives in that second category.
- Agent (automatic). You describe a feature. The agent plans, edits, deploys, and hands back results—but it expects you to validate them.
- Max (autonomous). You give it a goal. It senses the app through interacting with it, runtime APIs, visual diffing, logs, and database branches. It decides what to do, makes the changes, and loops until the goal is met.
As the builder, you’re the product manager now with Max: you decide the destination, then let Max drive. You can watch every step or walk away and return to diffs, tests, and an audit trail.
In order to not go off course on long time horizons, the key feedback loop is runtime behavior - what the live app does after each change. Checking that keeps Max on course.
Things we learned building Max
- Runtime tools are are powerful. Max relies on privileged tools to use the app both as a user, and at the level of an engineer (logs, tests, auth, etc) It might be obvious given this what real human engineers do, but just giving it these tools improves performance significantly.
- Principles. Inspired by Andrej Karpathy’s “LLMs as sprite creatures,” we bias Max's system prompt with debugging principles (“reproduce first,” “verify end-to-end”) instead of brittle if this then that rules. We took a collection of great principles from engineers on the team.
- Novel merging. Builders can run multiple Max tasks at once. We built novel merge systems that coordinate what happens when multiple Maxes need to change
- Costs is next. V1 runs on frontier models. Open source models are catching up on tool use, and fine tuning & infra providers like Groq speed up by 10x. Quality first, then scale. As we speed this up, we'll need to rethink the UX on background agents.
- UX frontier. Max can run for 30 minutes. We designed a new threading model that supports mid-run intervention and summaries as it operates. But there's so much more to do here. Soon you'll have many Max's working at the same time as a swarm. What's the best way for a non technical user to control that swarm?
- Read. Read. Read. We also repeat this internally, but when building with LLMs nothing beats just reading the entire context and outputs...many thousands of times over across many thousands of projects. You build intuition fast for having read this all “Could a human engineer fix it the issue?” Then, if not, you already have ideas for the next tool that unlocks the next frontier of agent performance.
- Leap to real world data asap. In beta, support routed the hardest tickets to “debugging mode”. Max in disguise. It resolved 97% of them. It exposed many things we hadn't considered but when it started automatically resolving 97% of requests, we knew ready for prime time.
Where Max fits
- You can’t reproduce or describe the bug, but you know it’s real.
- You want a multi-surface feature shipped without babysitting every prompt.
- You need a pre-launch QA sweep across critical flows.
- You want to finish 4 hours of work in 15 minutes.
We keep finding more use cases for "agents that have full run time access" It's clear eventually this will just be the base agent. It unlocks so much power. Early users can really solve most issues they run into in so called "vibe coding"
It's clear to us software entrepreneurship will be soon be describing what you want to a team of agent engineers.
If you find this interesting from a design, infra, or AI perspective, we’re hiring in San Francisco. Try Max, and reach out with what you think.