All posts

The 70% Idle Tax: How AetherFS Is Repricing the Agent Compute Stack

Most agent VMs sit idle while the model thinks.

Published

Feb 10, 2026

Topic

Engineering

Most agent VMs sit idle while the model thinks. A new class of overlay filesystem wants to make that idle time disappear from the bill.

There is an economic anomaly at the center of agent infrastructure that almost no one talks about directly. Agent platforms today pay for compute that, by the platforms' own internal estimates, sits idle for the majority of its billable life. The team building AetherFS, an overlay filesystem for agent workloads, has put a number on it: roughly 70 percent of typical agent VM time is spent in idle states (model inference round trips, planning steps, human review windows) rather than in active computation.

That is a remarkable claim to throw at a category that markets itself as cost-efficient. Whether the 70 percent figure holds for any specific platform depends on workload mix, but the directional pattern is uncontroversial. Anyone who has watched an agent VM run for 90 seconds and burn 60 of those seconds waiting for a model response has seen the math. The question is not whether the idle time exists. The question is whether it is structural to the architecture or whether the architecture is wrong.

The Math of Idle

Per-agent VM pricing, as offered by e2b, Modal, Daytona, and the cloud hyperscalers' sandbox products, is structured around the assumption that a VM-second is the unit of value. A platform pays for the seconds the VM exists, regardless of what it is doing during those seconds. The pricing is rational in that it matches the underlying resource cost (the VM is consuming compute capacity whether or not it is busy), but it produces a curious incentive: an agent that thinks faster saves the platform money, while an agent that uses more compute per second of real work does not.

Three categories of cost compound the issue. First is cold start. Boot, repository clone, dependency installation. Public reports from agent platform engineers have placed cold start in the 90-to-180-second range for typical coding workloads. Second is idle-while-thinking. The seconds the VM exists while the agent waits for a model response, which on current frontier models can run 30 to 60 seconds for a meaningful generation. Third is post-completion teardown. The seconds between the agent finishing and the platform actually releasing the VM, which is rarely zero in production.

For an agent task of two minutes of active work, those three costs can comfortably double the total billable time. The platform pays for four minutes to deliver two. At consumer agent volumes, the math is significant. At enterprise volumes (tens of thousands of agent runs per day), it is a meaningful share of cost of revenue.

This is not a hypothetical. It is the cost structure under which most production agent platforms are operating today.

Overlay Sessions vs. VMs

AetherFS's pitch, paraphrased from public materials, is that the substrate is the wrong unit for agent compute. A VM is a unit designed for long-running, stateful workloads. Agent runs are short, often parallel, and sit on top of state that mostly does not change between runs. The mismatch between unit and workload is what produces the idle tax.

The AetherFS alternative replaces the VM as the unit with the session, defined as a copy-on-write overlay over a shared base. The base contains the materialized repository, dependencies, and any precomputed state that the platform wants to share across runs. Sessions fork from the base in O(1) metadata operations. The marginal cost of an additional session is the delta the session writes, not the cost of materializing a new VM.

The implications for the cost stack compound. Cold start drops from minutes to seconds because the base is already materialized. Idle-while-thinking still happens, but the cost of holding an idle session is the cost of an open file handle and a small amount of metadata, not the cost of a running VM. Storage costs scale with delta size rather than with workspace size, which the team describes as below 1 percent of base size for typical edit patterns.

Stack these effects and the headline claim from the AetherFS team (a 5x reduction in infrastructure cost for agent workloads) becomes a derivable result rather than a marketing number. The components are: faster cold start, near-zero idle holding cost, and dedup of shared state across sessions. Each component contributes a multiplier. The 5x is the product, not a single line item.

What Density Buys

The cost story is interesting on its own. The density story is what changes pricing models.

When the unit of work shifts from a VM to a session, the number of concurrent units a single host can support shifts by an order of magnitude or more. The team has described targets of thousands of concurrent sessions per node, against the dozens of concurrent VMs that a comparable host supports today. The density gain comes from the same architectural choices that drive the cost reduction: shared base, copy-on-write deltas, and bounded memory through reference counting in the content-addressable store.

What density buys, beyond the obvious cost reduction, is a different shape of pricing. At the low density of per-VM architectures, per-session pricing has to be priced at a level that recovers a meaningful share of a VM-hour. At the high density of overlay architectures, per-session pricing can approach the cost of a database row insert. The economic unit shifts from "fraction of a machine" to "fraction of a metadata operation."

This matters for agent platforms because their downstream pricing has been moving toward per-run or per-task billing. A platform that charges per task on top of a substrate priced per VM-hour is bridging two pricing models with different scaling properties, and the bridge is a margin compression. A platform that charges per task on top of a substrate priced per session has a much cleaner unit economics story. Whether AetherFS or its eventual successors actually price this way is unclear; the team has not yet published a commercial pricing model. But the substrate makes the pricing possible, and the demand for the pricing is already present in the market.

The Build vs. Buy Question

Most large agent platforms have already started building their own version of this substrate, internally. The motivation has been straightforward and almost always cost-driven. When a platform's substrate cost crosses a threshold (typically when usage moves from tens of thousands of runs per day to hundreds of thousands), the build option starts to pencil. The team builds a caching layer, then a snapshot system, then a deduplication store, then a session model. They have, by accident, built a filesystem.

The cost of building this in-house is significant. It absorbs a small team of senior engineers indefinitely, because workspace infrastructure has a long tail of correctness corner cases that surface only at scale. The opportunity cost of having those engineers working on substrate rather than on agent capabilities is the harder cost to measure but often the larger one.

AetherFS's bet, paraphrased, is that the substrate has reached the point where it is worth building once, well, and offering as infrastructure. The pitch to a build-leaning platform is not that the platform's existing in-house version is bad. The pitch is that the work is duplicate, the maintenance is ongoing, and the team's senior engineers would produce more value working on something else.

The Forward Bet

Speculative framing, flagged. If overlay infrastructure becomes the substrate of choice for agent platforms over the next two years, the line item that currently reads "compute" on agent platform P&Ls will compress significantly, and the line item that reads "inference" will expand to occupy the freed share of cost of revenue. That is a healthier shape for the category, because inference cost is at least correlated with delivered value, while idle compute cost is correlated with nothing in particular.

Whether AetherFS specifically captures this transition is unsettled. Whether the transition happens is, by the math, a question of when rather than whether.