01 Mar 2017

AWS S3 is down!

You might have noticed that the internet was struggling today. This was due to a problem with Amazon’s Simple Storage Service (S3).

When Amazon S3 is down. #awscloud #awss3 pic.twitter.com/KQo4sVvkAl
— Fernando (@fmc_sea) February 28, 2017

The above is pretty much the use case for the majority of those using S3 as a backend. S3 offers durable, available, automatically scalable storage for minimal cost.

It’s a great service I rave about in most posts on this blog.

S3 Availability

The key point of note here is availability.

S3 guarantees 99.99% availability of all objects stored in S3, regardless of the storage type (Standard, Reduced Redundancy etc).

This translates to a downtime of just less than 53 minutes per year.

0.01% of 1 year = 52.59 minutes

According to Tarsnap the actual downtime was roughly 3 hours and 20 minutes (when looking at GETs).

The break down was as follows:

37:29 UTC: First InternalError response from S3
37:32 UTC: Last successful request
37:56 UTC: S3 switches from 100% InternalError responses to 503 responses
37 UTC: AWS notified of 'high error rates'
34:36 UTC: S3 switches from 503 responses back to InternalError responses
35:50 UTC: First successful request
54: AWS notified of GET requests partially suceeding
    ~21:03 UTC: Most GET requests succeeding
13 UTC: AWS notifies of GET requests fully restored
    ~21:52 UTC: Most PUT requests succeeding
11 UTC: AWS notifies of PUT requests fully restored

The disconnect between the problem being seen by it’s customers, acknowledging the issue and notifying the community is terribly large. As a company that prides itself on reliability, this was a scary thing to witness.

Hey @awsstatus how long will #s3 be down? Are you going to reduce the 99.99% available rating after this? #ouch
— Michael Standen (@_MichaelStanden) February 28, 2017

I’m not expecting a response.

AWS Status

One of the most interesting take aways was that the AWS Status page wasn’t showing any error. This was due to the status page using S3 as part of it’s backend.

Wow!

@awscloud please don't host your status service on the service it's reporting a status of...
— Michael Standen (@_MichaelStanden) February 28, 2017

Amazon admits the status page can’t be updated because the images are in S3: pic.twitter.com/gTtWajirSh
— MikeTalonNYC (@MikeTalonNYC) February 28, 2017

Amazon #s3 Outage pic.twitter.com/h4xYZKkoHe
— Andrew J Oldaker (@weatherdrew) February 28, 2017

The status page has since been updated to reflect the real state without relying on the service it was reporting about.

Interestingly, I never once saw an issue listed on the status page for regions outside North America, but definately could not access a bucket in Sydney, Asia Pacific. Maybe this was a partial fix? Stay tuned I guess.

Redundancy is the key to survival

Did you notice this website go down? No?

Because I have AWS Cloudfront in front of S3, caching responses. As a static website, this is a simple solution and also offers a number of other benefits.

Some of my other services would have gone down (Cat Facts), if they were being utilised during this time. Luckily for me, at the time of writing, all the services I host on AWS are low traffic.

The correct way to get around a problem like this, where a 3rd party service goes down is to not keep all your eggs in one bucket.

This #aws #s3 #outage is a good reminder not to keep all your eggs in one bucket
— Michael Standen (@_MichaelStanden) February 28, 2017

Google Cloud Services offers direct support for the S3 XML API, and has multi-regional support at a fraction of the cost of S3. Find out how to port S3 to GCS here

But why stop there?

You can also use Azure or the lesser known Backblaze B2.

I prefer to trust no one, and look after the data myself and you should too.

Thanks for reading

If you enjoyed the content please consider leaving a comment, sharing or hiring me.

Cheers,
Michael

Twitter Facebook Google+

AWS S3 Failure

Photo by Lukas Budimaier

AWS S3 is down!

S3 Availability

AWS Status

Redundancy is the key to survival

Thanks for reading