That's right. Another tech blog.

Designed for failure?

Designed for failure.

This phrase has been showing up a lot more since the most recent AWS outage. It's a pretty simple concept- your design should expect component failure and be able to deal with it. It's something enterprise organizations have done for years in their infrastructure. The key word there (and why this is getting a lot of press now) is infrastructure. RAID, clustering, redundant networks, load balancers, etc. have been used by service providers and internal IT for years to avoid single points of failure within their infrastructure design, and provide availability for their applications. The applications (for the most part) have not been designed for failure.

Some applications have evolved to add some levels of increased availability (Oracle ADG, Microsoft Exchange DAGs, etc) and some new applications are being written from the ground up to handle all sorts of component failures. A great example (and has been touted frequently) is the Netflix design that leverages the famed chaos monkey.

The problem is, this is the exception not the rule. Most organizations have tons of apps from tons of vendors, and I'm sure some still contain legacy code from punch-card days. It's easier to say "rewrite your apps for the cloud" than it is to actually do it. Even if most of your apps have some level of availability baked in, each one has its own methodology which can be a management nightmare.

This has been one of the driving forces behind virtualization of legacy applications beyond simple consolidation; leveraging the increased availability provided by solutions like VMware HA, DRS, FT, SRM, etc. This way your creaky old app written with a foxpro back-end can get higher levels of availability. VMware (and it's customers) have been highly successful with this methodology- designing for failure in the infrastructure is cheaper & quicker than rewriting old apps or migrating to new ones.


This dovetails nicely with the previous discussion around hardware commoditization and value added software layers. Hardware is commodity, but your 'availability/designed for failure' secret sauce isn't. If you're Netflix or Google and can put that secret sauce in your software, sure you can use whatever white-box hardware you can buy by the pallet and be fine. But if you're the average enterprise and your apps aren't built that way, well then you're going to need some layer between those dumb disks/cpus that provides that availability. Today those layers are called things like VMware vSphere, EMC Enginuity, VNX OE, and Cisco UCS Manager.