Imagine you’ve just designed and deployed a data center. It was hell but you are smiling. Your design is homogenous, simple and elegant. A greenfield datacenter full of shiny, identical network devices. Because the design was so consistent and repeatable you scripted the generation of the device configurations without too much hassle. This is a network with an easily ‘provisioned’ network configuration.
But day-one provisioning is only one part of the puzzle. The real prize is a centrally ‘controlled’ network configuration, where all config changes happen centrally and a configuration policy is enforced for the lifetime of the network. Whilst this seems like the holy grail, you need to understand that you will have to trade some flexibility to reach this easy-to-operate nirvana.
Your network is dragged to hell
So… emm… Network Sherpa…I’m going to have to go ahead and ask you to …. hack your beautiful datacenter design and make it do crazy sh1t that wasn’t part of your original design. And emm… actually…. I’m going to ask you to go ahead and do some policy based routing to solve another problem tomorrow.
The sad news is that the needs of your business are working against your desire for a homogenous, simple network. The business wants flexibility. You’ll need flexibility to deal with errored links, bugs, capacity problems, extra racks, higher speeds on individual servers, etc, next-gen chassis replacements, etc; all on different devices in your once homogenous system. This increased flexibility leads to complex configurations which are harder to maintain.
It also becomes much, much harder to tell if a network device configuration is conformant to a policy or to make changes to bring it back into conformance. The more variety you have in your network, the more branching decisions are made by your configuration validation software, and thus your configuration conformance code becomes more brittle and less realiable (see cylomatic complexity). Of course you could just cop out and add an exception to your policy, uttering the magic words, “just this once” or “it’s only temporary”.
I see a very large gap between scripted config generation and centralised configuration control. Dell have done some really awesome work with their dell fabric manager. Dell have assembled a pod of high-speed low-cost L3 switches using commodity ASICS and an externally connected fabric. More importantly Dell have delivered a centralised configuration provisioning and control tool to help solve some of the problems outlined below.
Can you commit to centralised configuration control?
So, why doesn’t everyone just deploy a centralised configuration enforcement system up front? Because it requires you to make some very tough policy decisions, and introduce……inflexibility!! How would you handle the questions below if you were designing a centralised configuration control system?
- How will you deal with everyday requests for the heterogenity within your controlled network? Will you allow for unlimited feature and hardware variance as they have it in todays classical networks? Or try to present the network to your customers as if it were a single large-chassis switch with limited variance (akin to line card differences).
- How will you handle incremental configuration changes. e.g. new route-maps? Will you allow these changes to be merged with the running config (flexible & complex), or blow away and re-write the configs each time (simple & risky)?
- Can you perform all maintenance and troubleshooting activities (which would require configuration changes) from the central location? Can you foresee all of these in advance?
- Can you keep track of those maintenance/troubleshooting config changes and ensure they can be easily backed out by the system if the engineer forgets to?
- Can you enforce zero CLI interaction on individual routers and switches? Are you ready for the operator backlash that will result?
- If not…who gets authority if there is a conflict between the config on the controller and the config on the device-under-control (DUC).
- What do you do if you find a conflict? Raise an alert or enforce conformance upon the wayward DUC (potentiall blowing away any work arounds / hacks).
You’re still in control
Controlling networks configurations centrally can be done and it is the future of large-scale networking in my opinion. But you, the engineer are still required to make some tough policy decisions. Even the Dell Fabric Manager will not ‘enforce’ the correct policy. It can quickly highlight to you that there’s a discrepancy, but you need to act.
Does your central configuration controller depend upon a highly-structured and thus inflexible topology? Does it assume homogenous hardware. Will you fight to keep the network simple, and more importantly will you win?
You will need to understand the behavior and constraints and implications of this new breed of centralized configuration controller. Or you can ignore the questions and watch as your beautiful new network gets dragged to hell like it’s predecessors.