The global internet outage, which paralyzed much of the internet for more than an hour, was caused by a software bug, said the company behind it.
Fastly said a problem was triggered in its software when one of its customers changed their settings.
When that happened, a series of problems began that took many of the world’s largest websites offline. Reddit, Amazon, the UK government, and many of the largest news organizations in the world were unavailable, and users instead saw a series of error messages.
“This outage was widespread and severe, and we sincerely regret the impact it has had on our customers and those who rely on them,” the company said in a blog post written by Nick Rockwell, senior engineering and infrastructure executive.
He said the problem should have been foreseen.
Fastly operates a group of servers strategically placed around the world to allow customers to quickly and securely move and store content near their end users.
The company post included a timeline of events and promised to investigate and explain why Fastly had failed to detect the software bug during its own testing process.
Fastly said the bug was in a software update that was shipped to customers on May 12, but wasn’t triggered until an unidentified customer made settings changes that triggered the problem, “which resulted in 85 percent our network returned errors ”.
Quickly noticed the failure within a minute, it occurred at 9:47 am, and engineers found the cause at 10:27 am. After disabling the settings that triggered the problem, most of the corporate network quickly recovered.
“Within 49 minutes, 95 percent of our network was up and running normally,” said the company.
The networks were fully restored at 12:35 p.m. and a permanent software fix began deploying at 5:25 p.m., Fastly said.
Additional coverage from Reuters