Running AI Agent Pools Against Your Live Kubernetes Cluster, with mirrord
AI coding is evolving fast. Editor autocomplete gave way to agents that take whole tasks (Claude Code, Cursor’s agent mode), and those agents are now moving off developer laptops into the cloud. The cloud variant runs unattended: it picks up a task, works it through, and produces a result with no developer watching.
The first question for teams adopting cloud agents is what environment the agent works in. The reflex is a per-agent sandbox: a small copy of the relevant services and a personal database to mutate freely. It works for trivial cases, and fails the moment the agent’s work depends on services or data the sandbox doesn’t include: the agent ends up generating code that won’t survive contact with the real system. A sandbox is also static, so it drifts: real services keep being updated, real data keeps changing, and the sandbox falls out of sync with the actual staging setup the code will run against.
The alternative is your existing staging cluster, which already has the real services, data, and queues, kept current by your team. Agents could use it directly, but plain access isn’t enough. Every change would go through a deploy cycle, wasting time and tokens on every iteration. And multiple agents working in parallel would step on each other and corrupt shared state for everyone else on staging.
mirrord fixes both of these problems. It connects an agent’s process to the cluster with no deploy needed, so iteration takes seconds, and it isolates each agent’s work so agents (and developers) don’t collide. This isn’t theoretical: monday.com has recently launched this for their team using Cursor’s self-hosted agents, and Podium runs the same pattern on agent pools they built in-house.
How it works
The agent runs in a pool of pods you control. Each pod includes mirrord and credentials for an agent identity (a Kubernetes service account representing “any agent running in this pool”) in your cluster. When the agent runs the service it’s modifying with mirrord, that process behaves as one of the cluster’s own workloads: same services, data, environment variables, and state a deployed pod would see.
The flow:
- You provision a pool of agent worker pods, idle until tasks arrive. Each pod’s image includes mirrord and credentials for an agent identity, with RBAC scoped to the namespaces you want the agent to access.
- Each task is triggered through the agent platform (PR comment, Slack message, web UI, or API), and the platform’s control plane routes it to an available worker in your pool. The agent runs inside that worker, calling mirrord like any other CLI tool.
- To understand the live system, it uses mirrord to read real cluster state: what a service returns, what’s in a queue, what config the deployed workload runs with. Its code is informed by what the cluster does now, not by stale repo files.
- To verify a change, it runs the service under test with mirrord, which gives that process the cluster’s real traffic, env vars, and downstream calls.
- For changes that write to a database or consume from a queue, mirrord’s DB branching and queue splitting give it an isolated database branch or a filter that routes only the agent’s own messages to it.
- When done, the agent opens a PR. The pool reclaims the worker after a TTL.
Compared to a per-agent sandbox, this avoids the maintenance, the drift, the per-agent cloud cost of full architecture copies, and the setup time. The agent uses the real, current staging services directly, and only the specific state it needs to mutate (a database, a queue) gets branched, on demand, and discarded after. The agent’s work also never leaves your network: pool, cluster, data, and traffic all stay inside your perimeter.
Why this matters more for agents than for humans
mirrord exists because human developers have always had this problem, just in slower motion: local environments drift, mocks encode what was assumed when they were written, integration bugs surface after the PR is open. Humans compensate with experience, but agents can’t. They generate code from the patterns they’ve seen and the context they have, and that context, limited to your repo and stale docs, is often wrong in ways that only show up against real state and behavior:
- A query that’s correct against the repo’s migration files but behaves differently against a live database shaped by past operations work.
- Code that depends on configuration and feature flags whose real values live in the deployed environment, not in any committed file.
- Code that calls another service following the contract in the repo, while the deployed service has changed underneath: renamed fields, new required parameters, stricter validation.
These aren’t agent failures, they’re context failures. mirrord provides exactly that context, live from the staging cluster: the agent reads from the running system, runs against it for verification, and closes the loop on its own, without a deploy or a human in between.
Setting it up
Most of mirrord’s features work the same for an agent in a pod as for a developer on a laptop. The setup work specific to agent pools involves these five things:
- Bake mirrord into the worker image. Extend whatever base image the agent platform uses and add mirrord to it. One-time per pool.
- Create a service account for the agent identity. A dedicated identity representing any agent in the pool; RBAC and mirrord policies hang off it.
- Get its kubeconfig into the pod. Whatever credential delivery your platform supports (secret mount, in-cluster service-account token, etc.). The kubeconfig is scoped to dev or staging namespaces, the same access a developer already has.
- Scope mirrord policies to the agent identity. With
MirrordPolicyandMirrordClusterPolicy, allow the operations the agent needs and constrain the ones you don’t. DB branches and queue splits are useful where you want write isolation. - Configure TTLs. mirrord’s sessions, DB branches, and queue splits all have TTLs. Set them to fit your iteration time; the operator reclaims anything past its TTL. See the DB branching and queue splitting docs for the config.
The skills at metalbear-co/skills can be bundled into the worker image so the agent has ready-made instructions for the common workflows. mirrord’s operator logs cluster-side operations per identity, so an agent’s activity is as visible as any team member’s.
Closing the loop: preview environments
The agent’s own verification catches whether the code is correct, and to a large extent, whether it behaves correctly. One question remains: does the change do what was actually requested?
Preview environments answer that. When the agent opens a PR, mirrord spins up a minimal preview: just the changed service, running in isolation against the main staging environment, with no full environment per PR. A reviewer clicks a link and sees the change running against real cluster state.
This means the reviewer doesn’t have to be an engineer. At monday.com, product managers drive the loop end to end: a PM files a request, an agent works it in the pool, the PR comes back with a preview link using mirrord, the PM tests the running result without reading code.
Getting started
The pattern works today in real enterprise settings, on stable mirrord features that developer teams have used at scale for years.
To adopt it: a pool of agent worker pods, a worker image with mirrord, a service account for the agent identity, policies scoping it, and (for the PM workflow) preview environments infrastructure.
If you’re running cloud agents at scale and haven’t solved cluster access, this is the pattern to evaluate.
Frequently asked cloud agent questions
How can multiple AI agents run in parallel without stepping on each other?
With DB branching and queue splitting, each AI agent gets its own isolated state, for example, its own database branch and its own queue messages. This means agents don’t overwrite each other’s data or consume messages intended for another agent.
At the same time, resources like read-only outgoing traffic, environment variables, and configs are naturally concurrent.
For incoming traffic, mirrord supports HTTP filters. Each agent can receive only the requests tagged for it, while all other traffic in the shared cluster continues normally.
Does the AI agent's traffic leave our network?
How is this different from a per-agent sandbox?
A sandbox is a static copy of a subset of services. It starts drifting away from staging as soon as it’s created unless you invest significant time and effort maintaining it. Even then, it costs nearly as much to run as the real environment, multiplied by the number of concurrent agents.
With mirrord, multiple agents share the real staging cluster directly, including the current versions of services, live data, and configuration. Only the state an agent actually writes to (such as a database) is branched on demand and discarded when the session ends.
Which AI agent platforms does this work with?
mirrord exec like any other CLI tool. Cursor’s self-hosted agents, Devin, and Claude managed agents all fit this model. If you’ve built an in-house agent pool on Kubernetes like Podium did, it works the same way.