Netflix is known for its Simian Army, which it lets loose to test its service every once in a while. The cloud calls for strict availability and reliability, and the only way to ensure this is through stringent testing. Netflix has an amusing nomenclature for its testing strategy. It likes to group its cloud testing tools into a simian army. As amusing as that may be, when it comes to implementation, the simian army is a piece of commendable technical wizardry. The Latency monkey, Doctor Monkey, Janitor Monkey, Security Monkey, all are part of the simian army at Netflix.
Recently, Netflix has decided to share one of its earliest cloud-testing tools with the world, and what better way is there to share a piece of technology than open sourcing it? Netflix describes Chaos Monkey:
A tool that randomly disables our production instances to make sure we can survive this common type of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption.
Chaos Monkey runs in the Amazon Web Services (AWS). The service has a configurable schedule that defaults to run from 9 AM to 3 PM. The schedule can be configured and it can be used as a great tool to perform system downtime drills.
The world of steaming media is expanding and high availability and is key to this entire industry. Netflix has done a good job by giving back something to its own ecosystem. This is just the beginning, and Netflix has plans to release its other simian tools as well.