The damage of one second
Update: According to the AWS status page the incident
was a problem related to BGP route leaking. AWS does not hint
on a leap second related incident as originally suggested by
this post!
Tonight we had another leap second and not without suffering at
the same time. At the end of the post you can find two screenshots
of outages suggested by downdetector.com.
The screenshots were taken shortly after midnight UTC
and you can easily spot those sites with problems by the
disting peak at the right site of the graph.
AWS Outage
What is common to many of the affected sites: them being hosted at AWS which had some problems. Quote:[RESOLVED] Internet connectivity issues Between 5:25 PM and 6:07 PM PDT we experienced an Internet connectivity issue with a provider outside of our network which affected traffic from some end-user networks. The issue has been resolved and the service is operating normally. The root cause of this issue was an external Internet service provider incorrectly accepting a set of routes for some AWS addresses from a third-party who inadvertently advertised these routes. Providers should normally reject these routes by policy, but in this case the routes were accepted and propagated to other ISPs affecting some end-user’s ability to access AWS resources. Once we identified the provider and third-party network, we took action to route traffic around this incorrect routing configuration. We have worked with this external Internet service provider to ensure that this does not reoccur.
Incident Details
- Mashable: AWS Disruption
- ThousandEyes: Route leak causes Amazon and AWS outage
- Leap second causes ~5 minutes of transient global routing instability
Graphs from downdetector.com
Note that those graphs indicate user reported issues:
