How I Learned to Stop Worrying and Trust AI Coding Agents
I first joined the hype cycle around AI coding tools about four years ago with GitHub Copilot. Like everyone else back then, I was curious if AI could actually code. I found Copilot useful for autocomplete and little helpers, but when it came to actually solving an issue or adding a feature end to end, it was just⊠average. It kept going into loops and making silly mistakes, so I quickly went back to using it only for code completion.
Then Cursor launched, and everyone got excited (again). I tried it and gave it some tasks. It did fine for a few, but for others it suffered from the same problem of spiraling into loops without actually fixing the issue. Reviewing its code felt like reviewing the pull request of a junior developer who just doesnât get the hint. Eventually, I dropped it too.
So when people started raving about Claude as a coding agent, I rolled my eyes. Another ânext big thingâ in AI coding? Sure.
One of Those Bugs
Fast forward. We hit a nasty issue with the mirrord Operator: installing it through Terraform would just halt. Terraform complained that our OpenAPI schema was broken.
We handed it to the engineer on our team who’s the guy we call when we have a Problem we need solved. He spent days digging into the Rust proc macro that generates the schema, with no luck. I poked at it too, created a reproducible test case, hacked at the structs, and tried to see what broke. The schema was deep and messy with nested types, and every change I made seemed to make things worse.
We gave the customer a workaround and moved on. But then it happened again. And again. By the third report, I took it personally.
At this point, I had been playing with Claude for other mirrord-related work and was impressed with the UX and the output. My initial skepticism was gone. Claude was clean, fast, and didnât make me click through endless menus. So when that third customer hit the same issue, I thought: fine. Letâs see what Claude can do.
I handed it the test case - a test that runs the Operator’s schema generation code, then runs Terraform’s schema validation on it - and explained the problem. Claude went to work. It found the first issue and âfixedâ it. Tests still failed. It tried again and âfixedâ something else, but tests were still failing. Normally, this is when AI agents spiral into just repeating the same loop, and I was ready to dismiss Claude too. But then it did something different.
The Moment of Magic
Claude wrote its own helper script. The script analyzed the generated schema and specifically looked for the problematic pattern that was tripping up Terraform. Basically, it optimized its own feedback loop. Instead of relying on my test case (generate schema, validate the whole thing using Terraform), it wrote a faster, more targeted script to guide its debugging (generate schema, look for the specific erroneous pattern).
The script it wrote for itself is what helped it break out of the loop. Normally, it would make a small change, run the tests I had written, fail, and then try again, without ever learning anything new, because the tests didnât provide enough information. By writing its own helper script, Claude gave itself better feedback. It wasnât just fixing code anymore but was improving the way it understood the problem, and thatâs what got it unstuck. And when the helper script finally stopped finding errors, Claude ran the original test case again. This time, it passed and everything worked.
The full PR can be seen here for reference.
This is a huge shift. Most AI coding tools Iâve tried just bang their heads against the wall. Copilot, Cursor, even older Claude versions, when they hit a problem, they just keep looping: generate â test â fail â repeat. What Claude did here was on a whole new level.
AI Coding Agents Are the Future
Okay, I know thatâs debatable. A year ago, if youâd asked me whether AI was going to change how we develop software, I probably wouldâve shrugged it off and said itâs just a fancier autocomplete that impresses junior engineers without adding much value to the lives of senior engineers.
Today, Iâm not so sure. Watching Claude tackle this problem, not just by patching code, but by inventing its own way to debug it, showed me something different. It wasnât acting like a shortcut for writing functions, it was acting like an engineer who pauses, builds the right tool for the job, and then uses it to finish the work faster. Thatâs what feels meaningful: seeing AI move from code generation to problem-solving.
I still donât think coding agents are ready to run wild on a codebase unsupervised. They need direction, they make mistakes, and sometimes they fail in ways no human would. But after this experience, itâs clear to me that AI coding agents are starting to move beyond autocomplete. Theyâre beginning to reason about code. Weâre crossing a line from AI that completes code to AI that understands it. Iâve heard similar stories from our customers as well, who are using mirrord with AI agents. The agents are able to use mirrord to get feedback on their own code and then iterate on it. Theyâre reaching a point where they can deliver a feature end to end (but with supervision!). If youâre interested in learning how Claude and mirrord do that together, check out the post I wrote earlier about this.