SPAN Scaling Challenge

December 15, 2015 John Harrington Comments 2 comments

I’m facing a mini scaling challenge with Cisco SPAN (Switched Port ANalyzer) session and thought it would be good to share it with you fine folk.

SPAN Challenge

A 3750X switch is currently SPAN-ing a 10Gbps interface to a 1Gbps egress port. A server is directly attached and is using dump cap to capture a subset (5%) of the overall traffic for analysis.
The 10G link under-utilised, but is running close to the 1Gbps traffic limit in the Rx direction. Tx traffic is very low by comparison, but the SPAN session is capturing both directions.
The aggregated flow from both directions is overrunning the SPAN destination 1Gbps port. The challenge is to ensure we can continue to capture without discarding any interesting data. Let’s explore the options together.

Options explored

Here are the options I explored and my analysis of each one. You may have a different option and I haven’t solved the problem yet, so I’d be interested in your thoughts.

1. Passive Optical Tap

Everyone has their favourites and my default option when people think about scaling span sessions is to not SPAN. I strongly favour the use of passive optical taps, like the one in the photo above. I’ve included an IXIA NetOptics Tap overview for background information if you haven’t seen these before. I love their relative simplicity and their complete independence from ASIC/software limitations and bugs.
However, I don’t think that a tap is the best solution for this particular challenge for the following reasons:

I can’t tap a 10G stream and send it to a 1G port. Even if we ignore the speed mismatch, 10G and 1G ethernet use different line codes.
I could upgrade sniffer NIC to 10Gbps, but there’s no budget available and it take too long to procure NICs and transceivers.
Lastly, we really only need a subset of the traffic, as we discard most of it when doing the tshark dump cap operation.
Perhaps we could only SPAN (or mirror in Junos terminology) the required subset of the traffic?

2. Span to port-channel

Sadly this option is a figment of my imagination and is not supported. I strongly suspect that the ASIC logic which supports SPAN doesn’t have a way of recirculating the packet through the Port-channel hashing logic. Moving on.

3. Split the streams

The most basic solution would be to SPAN each direction separately on both the ingress and egress 10G ports. This would involve a 1G SPAN destination port for each source direction. I would have to capture Rx on interface 10G-external and Rx on 10G-internal or do the same with a Tx capture on each port.
This option isn’t viable either:

I would have to span all input ports including the cross-link between the switches. Not a show-stopper but inconvenient, as the cross link is carrying other non-interesting traffic.
This would only buy me a small amount of runway. The Rx direction is very close to hitting 1Gbps. Removing the load of the Tx direction would help a little but I’d face this problem again in a month or two as Rx traffic grows.

4. Span with access-list

A colleague asked if it was possible to apply an access list to the egress SPAN port, and … the answer is no.
Well, it may be possible to hack this by converting the session to an RSPAN session, and use a VACL on the RSPAN destination port. The Sniffing server should be able to discard the RSPAN encapsulation and still access the packet but I’ll need to think this over a little. This option seems plausible on paper.

5. VACL Capture

I hadn’t heard of this feature until I hit the internets to do some research. The idea here is to use a VACL inline on the monitored 10Gbps port, matching the ‘interesting traffic’ with a ‘capture’ keyword, and a permit all for the uninteresting traffic.
I haven’t done this before so I was a little scared about applying a filter to the production traffic at source. I was actually quite relieved to hear it was a 6500-only feature.
Option 5 is off the table. Phew!

6. FSPAN – Flow-based SPAN

Wait, what’s this flow-based SPAN, an alternative to VACL capture. Bingo! This is exactly what I’m looking for. You can apply a regular ACL filter to the SPAN session to limit the SPAN destination traffic.
The only downside I can see is that the FSPAN feature requires an advanced IP services image. I’m okay with this tradeoff. I know that option 4 could work, but FSPAN seems to involve no hacks, so the simpler option wins in my book. Next step is to configure configure FSPAN and test it.
I’ll let you know how it goes.

Sherpa Summary

If you haven’t had to think about scaling a SPAN session, I hope this post helped you a little. If you think I’m making a mistake by choosing to use FSPAN, please let me know in the comments.

2 thoughts on “SPAN Scaling Challenge”

Dave says:

December 15, 2015 at 7:18 pm

And why not use an aggregator solution such as gigamon in combination with your tap or span port. You can slice, dice and replicate the traffic in so many ways

1. John Harrington says:
  
  December 15, 2015 at 8:04 pm
  
  Hi Dave,
  It’s true that an aggregator such as Gigamon or Netscout/OnPath would do a good job of filtering the traffic but it’s a very expensive way to solve this particular problem.
  Regards,
  John H

NetworkSherpa

navigating networks