The solution NephoScale implemented using Cumulus Linux enables affordable 10Gbps/400Gbps high capacity networking that is highly automated and very easy to manage.
We leverage both Quanta 10Gbps/400Gbps switches running Cumulus Linux for top of rack switches. The switches are connected to spine switches in a Layer 3 Clos architecture to provide maximum flexibility and scalability.
Extensive use of BGP
Our network makes extensive use of BGP not only on switches but also on some compute nodes. It also leverages IS-IS in parts of the network. Being able to utilize Quagga on both switches and servers helped provide a consistent and familiar experience.
“With Cumulus Linux, manual configuration is no longer needed for provisioning new racks, applying new policy or responding to security advisories. The lifecycle is now entirely automated.”
— Alan Meadows, Chief Architect at NephoScale
With Cumulus Linux, the process of provisioning and configuration management is entirely automated. When the switch boots for the first time, it retrieves the Cumulus Linux OS image and installs the OS almost instantly. Configuration management tools then take over the auto provisioning process. There is no need for an administrator to log in to the box, as all server and network provisioning is automated by software.
We make extensive use of Chef as a configuration management tool on the network infrastructure. Using configuration management tools to provision and update switches is extremely important because it guarantees that there is no discrepancy among all the switches running Cumulus Linux in various environments. This could not be guaranteed with manual configuration. Switches running Cumulus Linux are provisioned with Chef, and all user accounts and configurations in the infrastructure are driven by Chef.
Using Linux as a platform is not limited to configuration management. We use industry-standard tools and internal tools to monitor systems. We have written plugins for Nagios that run on Cumulus Linux switches and publishes Cumulus Linux statistics into Graphite. The entire network leverages sFlow (through inMon’s sFlow) to help trend capacity and pinpoint something out of the ordinary like an attack. We also have written our own agent to interact with the switch. Further simplification is being introduced by leveraging Prescriptive Topology Manager from Cumulus Networks. PTM is a cable and peer verification utility that ensures that new zones are cabled the way they are expected to be.
With the SDN/NFV and Software-defined-datacenter technology we have developed at NephoScale and out partnership with Cumulus Networks our business we successfully created a highly agile and flexible network infrastructure that requires few manual interventions and thus fewer people are required to operate and manage it. Doing more with less is our goal, and in the case that goal has been achieved with fantastic results. From a savings standpoint, we have experienced a drastic reduction in OpEx using automation, expanding use of existing data center tools, and leveraging the transparency of a native Linux distribution. We also realized additional savings based on CapEx cost reductions of at least 3x per 10G port over “traditional” 10G providers.
We realized OpEx savings because:
Network Deployment time is much Faster than Before
We can can deploy new racks, new zones faster today- Where it used to take a couple of days to expand the network, now it takes less than a day, including the time to cable and install the rack. Installing the OS and applying standard configurations with configuration management tools accounts for most of the savings.
The Network is Extremely Reliable due to Increased Automation and Consistency
Consistent configurations- Leveraging Chef for configuration management eliminates hard to troubleshoot esoteric networking problems that arise from minor discrepancies in switch configurations because the configuration.