On the 28th February 2017 the cloud went down. Or at least a large section of Amazon Web Services’ (AWS) Simple Storage Service (Amazon S3) went down, taking a lot of things down with it. The list of affected sites is long but the outage points to a much bigger problem; what does an outage mean when our cites, buildings and homes depend on the cloud?
AWS hosts 148,213 websites and 121,761 unique domains, according to Similar Tech. The outage completely downed well known web services like; Adobe, Amazon Twitch, Coindesk, Expedia, FanDuel, FiftyThree, Flipboard, GitHub, GitLab, Fabric (Google owned), IFTTT, JSTOR, Kickstarter, Lonely Planet, Mailchimp, Microsoft HockeyApp, MIT Technology Review, Quora, Square, Talkdesk, Trello, The Verge, the U.S. Securities and Exchange Commission and, ironically, isitdownrightnow.com. Other big names affected in some way include Airbnb, Apple, Pinterest, Snapchat and Time Inc.
This wasn’t a power outage, a hack, DDoS attack, or a physical attack. It seems one unfortunate employee made one unfortunate error that took a large portion of AWS’ servers down for five hours, with costs estimated in the region of hundreds of millions of dollars in losses for customers. Amazon’s official announcement on the outage reads:
“The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected. At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.”
There is no report of what happened, or will happen, to this unfortunate employee, but the silver lining of this cloud crash is a timely reminder that too much dependence on the cloud could make us vulnerable. It’s one thing to not be able to search for flights on Expedia or post on Snapchat, but it’s an entirely different proposition when your video surveillance, access control, or heating, ventilation and air conditioning (HVAC) systems are involved.
In fact, they were. A number of internet of things (IoT) systems went down, most notably home security cameras offered by Google-owned Nest. Nest Cam uses Amazon S3 for backend storage, and therefore its cams weren’t recording video footage for several hours as a result of the outage. Amazon’s own home automation system was also affected, prompting one twitter user to say “I had to turn off my lights with the switch instead of Alexa – this Amazon outage is really taking a toll.”
Unsurprisingly the twittersphere was full of jokes ranging from; “we are at code red – the coffeemaker also ran on AWS” to “Amazon is now Amazoff” and “Snapchat is down, millions of millennials just looked up for the first time in years.” While we can make light of it now, a major outage in a heavily cloud based IoT future is no laughing matter.
The cloud promises secure storage for a wide range of essential services from healthcare to security. Amazon itself claims to have "99.99999999% durability," on February 28th we got a glimpse of what that other 0.0000001% looks like. It doesn’t look pretty, especially when you consider not being able to get into your home, others being able to, or pretty much anything going wrong in a hospital.
“Analytics is now regarded as one of the key components that will determine the future of physical security,” explains our recent article on video surveillance and access control ‘as a service’. “End users are showing a great interest in the use of neural networks and deep learning and at this time, these algorithms require the cloud to be most effective.”
The AWS outage is important reminder that the cloud is not perfect. Even if Amazon, and other cloud service providers, were to create safeguards to protect against the same type of error happening again, there are dozens if not hundreds of other things that could go wrong. On the other hand, foregoing cloud based analytics in favor of less vulnerability is a step backwards for our smart society.
It goes without saying that a major outage like this will shake confidence for cloud services in the market. For any IoT access control and healthcare customers who weren’t already considering shunning the cloud, this event may make them reconsider. Whatever happens next this has been a wake up call for cloud computing, and another major event like this would surely have huge implications for the entire IoT, smart tech and cloud computing sector.
Half the web went down that day; half the city might go down tomorrow.
[contact-form-7 id="3204" title="memoori-newsletter"]