An emergency switch replacement can ruin your day. However, having network config backups is not enough. Restoring full service may not be as easy as just copying the running configuration from your RANCID CVS repo, or your colleagues hard drive. Restoring the ‘identity’ of your original switch is a multi-step and somewhat complicated process.
What’s the problem?
I’ve gone through my ‘painful learning experiences’ archives to produce a list of issues that can cause problems restoring the original behaviour of a switch. I’m going to repeat that phrase because it’s important, your goal is to “restore the original behaviour of the switch” rather than just get it up and running.
I’m using the word switch in this post, these challenges apply to all network devices.
Many important settings on network devices are not stored in the startup configuration. Some examples of this are:
- Cisco Vlan.dat
- Extreme Switches require individual files for ACLS
This is an easy enough problem to address. You just need to be aware that these files are important and have a way of getting the files backed-up. Lastly you need a process for restoring them to the new hardware device.
By invisible configuration I mean, ‘a command you apply to your switch, that changes behaviour, that is not stored in an accessible configuration file’. I’ve listed some examples below, but I’m sure we can add more in the comments.
- ‘sdm prefer’ – a biggie
- ‘no snmp-server group public v1’ – and similar lite commands
- ‘crypto key generate’ – odd because it doesn’t appear in the config but produces other config (a key), that does.
On next reboot
Some commands require a reboot to take affect. I’ve listed a few examples below but almost every vendor and platform have similar constraints.
- SDM Template changes – Require reboot on Cisco.
- Jumbo MTU – Requires reboot to take effect.
- Juniper HA for SRX – Require reboot, after HA configuration
SDM is a tricky one, because it isn’t stored in the startup config yet requires a reboot to take effect. You need to query the result of ‘show SDM profile’ and know to re-apply the non-default setting to a replacement device, and reboot.
Hrm… feels like this is a bit of a process.
Config that shoud not be copied
Some config in a startup configuration will be rejected when loading into a replacement switch.
- Most Keys and certs are generated locally and are and thus the switch will reject attempts to restore config entries.
- Information about stack members (may be needed but be careful)
- ‘ntp clock period’ – grr.
So … now we need to strip config lines from our backups in order to faithfully restore the switch behaviour.
Ah… licenses. I’ll keep it brief. The process of swapping licenses from one peice of hardware to another is getting slightly less dreadful. The key points are:
- A ‘license swap’ is another task to complete to get back to full service
- The process varies by vendor and product
- you should have a rough idea of that process before your kit fails
And the rest
- Have your SNMP IfIndexes changed; does your SNMP Monitoring tool need to rediscover your device
- What hacks or workarounds did you do to shift traffic away from the network devices, have you recorded them so that you can back them out?
Whats the fix?
There are enough steps required to perform a vanilla switch replacement that it’s worth having a documented process to rebuild and restore your device to full health.
- Get a device replacement / rebuilt process in place
- Make people aware of this process / Make it hard for them not to follow it
- Can you make it a habit to ask in design or change review meetings, ‘how does your design impact the rebuild process’?
A hardware replacement is a multi-step complicated process, yet it’s almost never documented.
Do you have a device rebuild process? Let me know in the comments.