Link Search Menu Expand Document

The damage of one second

Update: According to the AWS status page the incident was a problem related to BGP route leaking. AWS does not hint on a leap second related incident as originally suggested by this post! Tonight we had another leap second and not without suffering at the same time. At the end of the post you can find two screenshots of outages suggested by downdetector.com. The screenshots were taken shortly after midnight UTC and you can easily spot those sites with problems by the disting peak at the right site of the graph.

AWS Outage

What is common to many of the affected sites: them being hosted at AWS which had some problems. Quote:
[RESOLVED] Internet connectivity issues Between 5:25 PM and 6:07 PM PDT we experienced an Internet connectivity issue with a provider outside of our network which affected traffic from some end-user networks. The issue has been resolved and the service is operating normally. The root cause of this issue was an external Internet service provider incorrectly accepting a set of routes for some AWS addresses from a third-party who inadvertently advertised these routes. Providers should normally reject these routes by policy, but in this case the routes were accepted and propagated to other ISPs affecting some end-user’s ability to access AWS resources. Once we identified the provider and third-party network, we took action to route traffic around this incorrect routing configuration. We have worked with this external Internet service provider to ensure that this does not reoccur.

Incident Details

Graphs from downdetector.com

Note that those graphs indicate user reported issues: