Amazon.com Inc. said automated processes in its cloud computing business caused cascading outages across the internet this week that affected everything from Disney parks and Netflix videos to robot vacuums and Adele ticket sales.
In a highly technical statement, the company said “unexpected behavior from a large number of clients inside the internal network” caused “a large surge of connection activity that overwhelmed the networking devices between the internal network and the main AWS network, resulting in delays for communication between these networks”.
The problems began at about 10.30am New York time on December 7 and lasted several hours before Amazon managed to fix the problem. In the meantime, social media lit up with complaints from consumers angered that their smart home gadgetry and other internet-connected services had suddenly ceased to work.
“They don’t explain what this unexpected behavior was and they didn’t know what it was so they were guessing when trying to fix it which is why it took so long,” said Corey Quinn, cloud economist at Duckbill Group.
AWS is generally a reliable service. Amazon’s cloud division last suffered a major incident in 2017, when an employee accidentally turned off more servers than intended during repairs of a billing system. Still, the latest outage reminded the world how many products and services are centralised in common data centres run by just a handful of big tech companies like Amazon, Microsoft Corp. and Alphabet Inc.’s Google.
There is no easy fix to the problem. Some analysts believe companies should duplicate their services across multiple cloud computing providers so no one crash puts them out of commission. Others say a “multi-cloud” strategy would be impractical and could make companies even more vulnerable because they would be exposed to everyone’s outages, not just AWS’s.
“We know this event impacted many customers in significant ways,” the company said in the jargon-filled statement. “We will do everything we can to learn from this event and use it to improve our availability even further.”