AI robot arm unplugging AWS cloud logo causing outage

AWS Had a 13-Hour Outage. Its Own AI Tool May Have Caused It.

Cloud outages happen. But when your own AI coding tool might be the one pulling the plug? That’s a story worth unpacking.

According to reporting by the Financial Times, a December outage at Amazon Web Services lasted roughly 13 hours and was reportedly triggered by Kiro, Amazon’s own agentic AI coding tool. Four people familiar with the matter say engineers had deployed Kiro to make routine changes when the bot decided it needed to “delete and recreate the environment.” Things went sideways from there.

Amazon disputes that framing pretty strongly. But the story raises some genuinely interesting questions about AI autonomy, access controls, and what happens when the tools you build start causing the problems you’re supposed to solve.

What Kiro Actually Did

Kiro isn’t your average code assistant. It’s an agentic AI tool, which means it doesn’t just suggest code. It can take real, autonomous actions on your behalf.

That distinction matters a lot here. A regular coding tool might flag a problem or recommend a fix. Kiro can actually go ahead and make changes without someone manually clicking every step. That’s powerful when it works. It’s also potentially risky when it doesn’t.

In this case, the tool reportedly determined on its own that deleting and rebuilding an environment was the right move. Whether that was the correct call is debatable. What isn’t debatable is that a 13-hour disruption followed, primarily hitting AWS services in China.

Kiro agentic AI tool triggered 13-hour AWS outage in China

Amazon’s Side of the Story

Amazon pushed back hard on the Financial Times report, and their statement is worth reading carefully.

The company says the outage was the result of “user error, not AI error.” Specifically, Amazon points to misconfigured access controls. The staffer involved apparently had “broader permissions than expected,” which allowed the action to happen in the first place.

Amazon also clarified the scope of the disruption. According to their statement, it was “an extremely limited event” affecting a single service, AWS Cost Explorer, in just one of their 39 geographic regions worldwide. Compute, storage, databases, and the vast majority of AWS services were reportedly unaffected. The company says it received zero customer inquiries about the interruption.

By default, Amazon says, Kiro “requests authorization before taking any action.” So the tool wasn’t rogue. It had permission. The problem was that the permission was broader than it should have been.

This Apparently Wasn’t a One-Time Thing

Here’s where the story gets more interesting. Multiple Amazon employees told the Financial Times that this was “at least” the second time in recent months that the company’s AI tools sat at the center of a service disruption.

Kiro agentic AI tool triggered AWS China region 13-hour outage

“The outages were small but entirely foreseeable,” said one senior AWS employee.

Amazon flatly denies that a second event impacted AWS, calling that claim from the Financial Times “entirely false.”

Still, the fact that employees are speaking up at all suggests some internal discomfort with how quickly Kiro has been rolled out. Amazon launched the tool in July 2025 and has since pushed teams to adopt it aggressively. Leadership reportedly set an 80% weekly use goal and has been actively tracking adoption rates across the company. Amazon also sells Kiro to outside customers as a monthly subscription product.

The Bigger Outage Picture

It’s worth noting that this December incident comes after a far more disruptive event in October. That outage lasted 15 hours and knocked out services including Alexa, Snapchat, Fortnite, and Venmo, among others. Amazon attributed that one to a bug in its automation software.

Amazon also took issue with the word “outage” being applied to the December incident at all, given the limited scope of what was actually affected.

So there are two separate events here with two separate causes. Amazon’s position is that neither was fundamentally about AI going rogue. Both, in the company’s telling, come down to configuration issues that could theoretically happen with any developer tool, AI-powered or not.

Who’s Right?

Honestly, both sides have a point, and that’s what makes this worth paying attention to.

Amazon is correct that misconfigured access controls are a boring, well-understood problem that predates AI by decades. A human developer with overly broad permissions can cause the same kind of damage. Blaming the AI tool for a permissions problem isn’t entirely fair.

But the Financial Times and the AWS employees who spoke to them are pointing at something real too. Agentic AI tools amplify the consequences of bad configurations. A human developer pauses, double-checks, maybe asks a colleague. An agentic tool with broad permissions can move fast. Very fast.

Plus, when you’re aggressively pushing employees toward 80% adoption of a tool that can take autonomous actions in production environments, you probably want your access control hygiene to be absolutely airtight before you hit that gas pedal.

Amazon says it has since added safeguards, including mandatory peer review for production access. That’s a sensible response. It also quietly suggests that those guardrails weren’t fully in place before the incident.

The lesson here isn’t that AI coding tools are dangerous. It’s that autonomy and broad permissions are a combination that demands more caution than speed-of-adoption goals tend to allow. Whether you’re talking about AI or any other powerful developer tool, the boring stuff like access control, peer review, and permission scoping matters just as much as the flashy capabilities.

For now, watch this space. Agentic AI tools are moving into production environments at companies everywhere, not just Amazon. How the industry handles the governance side of that shift will matter a lot.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *