The cloud giant, Amazon, has recently made public a comprehensive explanation concerning Thursday’s cloud service failure which confirms that the culprit was a power outage at its North Virginia data center. Many popular sites, including Pinterest and Hipchat, suffered a downtime for a few hours on Thursday evening.
Amazon was honest and forthcoming with the customers during the outage and kept them updated on the status. The way they handled the situation is exemplary and shows that hosting companies should maintain steady communication with the customers in case of failures such as this one.
Amazon has verified that the power outage at its Ashburn, Virginia data center that was initially set off by a “cable fault in the high voltage Utility power distribution system” at about 8:44 p.m. PDT Thursday.
“Two Utility substations that feed the impacted Availability Zone went offline, causing the entire Availability Zone to fail over to generator power,” said Amazon in a statement. “All EC2 instances and EBS volumes successfully transferred to back-up generator power.
“At 8:53PM PDT, one of the generators overheated and powered off because of a defective cooling fan. At this point, the EC2 instances and EBS volumes supported by this generator failed over to their secondary back-up power (which is provided by a completely separate power distribution circuit complete with additional generator capacity).
“Unfortunately, one of the breakers on this particular back-up power distribution circuit was incorrectly configured to open at too low a power threshold and opened when the load transferred to this circuit. After this circuit breaker opened at 8:57PM PDT, the affected instances and volumes were left without primary, back-up, or secondary back-up power.”