My 10G switch goes up to 11

My 10G switch goes up to 11

Stealing bits

Nailing down the true speed of a 10GbE link can be tricky. For a start you to define ‘speed’ and ‘capacity’. Ivan Pepelnjak offers a nice summary in this post. Then there are little surprises. A former colleague of mine Fred Westermark first introduced me to the Ethernet interframe gap. I had never heard of this before and felt a bit cheated to be honest.  Since when do ‘bits’ need a rest. Pfff.

With the forced inter-frame pause of 96 bit times frames you can never achieve 10Gbps of useful data transfer between two devices, even if you counted protocol headers. If you’re fixed at a 10Gbps capacity then adding headers must subtract further from from the overall goodput, or data transfer rate available to the upper layer application. This is the approach taken for the addition of the frame, packet and segment headers of our traditional ethernet/ip/tcp stack.
However there is another approach. Certain components of the data path are being overclocked beyond 10Ghz,  sometimes at rates approaching 11!.  This over-clocking allows you to add headers or channel coding overhead without stealing any more bits from our starting 10Gbps line rate. We don’t get any extra capacity from these features, but there are hidden benefits.
I’ve listed a couple of examples of over-clocking below.

On the backplane

Switch Backplane – from Gigabit Switched Backplanes / Nick McKeown

 
 
 
 
 
 
 
 
Why does a pair of cisco Nexus FAB-1 line cards offer 92Gbps of capacity (46Gbps each), when they only need to handle 80Gbps from the line card?  The answer is cell tax.  Well, it appears that splitting variable length packets into fixed-length cells dramatically improves the efficiency of the backplane scheduler.  These cells need headers to aid re-assembly and that adds overhead, or cell tax.  [Edit: It turns out that I’m wrong here. The Nexus doesn’t do cell switching across the fabric, but still requires the additional overhead to mark the packets so that they can be correctly routed to the correct egress port.]   Once again over-clocking comes to the rescue, running at a higher speed to deal with the overhead. I recommend reading this fantastic paper by Nick McKeown on backplane switching.

On the LAN and Linecard

On the LAN we find another example of overclocking, although this time it’s ‘line coding’ rather than headers which need the extra speed. 10GbE LAN uses a line code called 64b/66b which adds an overhead of 3.125%.
Accoding to Wikipedia

64b/66b encoding is a line code that transforms 64-bit data to 66-bit line code to provide enough state changes to allow reasonable clock recovery and facilitate alignment of the data stream at the receiver

The 10GBASE-R line rate runs at 10.3125 Gb/s to absorb the line coding overhead and ensures a MAC data rate of 10 Gbps when the headers are removed.

Across the WAN

Forward Error Correction or FEC (note not the same as Feck!) is a beautiful thing.  FEC takes Checksums a step further and allows the signal receiver to detect and correct poor quality signals.   FEC is used heavily in wireless networks, but I came across it recently in a DWDM system.   When your 10Gb ethenet link is transported over a DWDM system it has a FEC inserted into it’s G.709 headers.  Again overclocking – this time at 10.7092 Gbps.

Take-aways

  • 10GbE doesn’t allow the full 10GbE due to interframe gap
  • There is lots of overhead being added internally and secretly masked by overclocking.
  • You can survive without knowing about line-coding, but look at Nick McKeown’s paper on backplanes or if pressed for time just read the summary.
  • I was prompted to write this post because I was mentoring a colleague and couldn’t quite get my facts straight.  Hope this helps.

5 thoughts on “My 10G switch goes up to 11

  1. Nicely written doc.
    Aista Networks is the coolest in any direction. Using industry standards since day one. Keep it cool use Arista Networks. The Intelligent Data Center Switching Solution for SDN.

    1. Thank for the feedback on the post. It does seem like you’re testing Arista tag lines in the rest of your comment. I like the cheesy cooling puns in #1 and #3. 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.