The state of private cloud is dire according to a number of pundits. Twitter’s de-facto cloud prognosticator warns: Do not build private clouds. Matt Asay declares private cloud to be a failure for a number of reasons, including the failure to change the way enterprises do business:
Private cloud lets enterprises pretend to be innovative, embracing pseudo-cloud computing even as they dress up antiquated IT in fancy nomenclature. But when Gartner surveyed enterprises that had deployed private clouds, only 5% claimed success
But he also lays blame on the most-hyped infrastructure technology of the past few years, OpenStack:
An increasing number of contributing companies are trying to steer OpenStack in highly divergent directions, making it hard for the newbie to figure out how to successfully use OpenStack. No wonder, then, that Joyent’s Bryan Cantrill hinted that the widespread failure of private clouds may be “due to OpenStack complexities.”
A large part of these complexities appear to be networking related:
- OpenStack’s default network subsystem is an unscalable, mess of opaque tunnels and single-points-of-failures.
- Neutron plugins promise panacea but are almost always introduce a radically different network architecture (usually overlay-based) that are unfamiliar to the network team
No wonder most touted OpenStack successes have bespoke network architectures:
- @WalmartLabs says they have 100k cores running, but
SDN is going to be our next step. Network is one area we need to put a lot of effort into. When you grow horizontally, you add compute, and the network is kind of the bottleneck for everything. That’s an area where you want more redundancy
- Paypal runs a large (8500 servers) cloud, but uses VMWare’s NVP for networking
- CERN runs a large OpenStack cloud but uses a custom network driver
In a different article, Matt Asay even cites industry insiders to state that OpenStack’s “dirty little secret” is that it doesn’t scale, largely due to broken networking.
In fact, as I’ve heard from a range of companies, a dirty secret of OpenStack is that it starts to fall over and can’t scale past 30 nodes if you are running plain vanilla main trunk OpenStack software
Frustrated cloud operators might look at the newest darling on the block to solve their complexities: Docker. At least it has a single voice and the much vaunted BDFL. Things should be better right? Well, not yet. Hopes are high, but both networking and storage are pretty much “roll your own”. There’s exotic options like Kubernetes, which pretty much only work in public clouds, SDN-like solutions (this, this, this, and more) and patchworks of proxies. Like the network operator needs yet another SDN solution rammed down her throat.
There is a common strand here: tone-deafness. Are folks thinking about how network operators really work? This lack of empathy sticks out like a sore thumb. If the solutions offered a genuine improvement to the state of networking then operators might take a chance at using something new. Network operators hoping to emulate web-scale operators such as AWS, Google and Facebook face a daunting task as well: private cloud solutions often add gratuitous complexity and take away none.
My favorite cloud software Apache CloudStack is not immune to these problems. The out-of-the-box network configuration is often a suboptimal choice for private clouds. Scalable solutions such as Basic Networking are ignored because, well, who wants something “basic”? In future posts, I hope to outline how private cloud operators can take architect their CloudStack networks for a better, scalable experience.
