Modern top-of-rack switches (or TORs) run at line rate and are non-oversubscribed. This means you get non-blocking  port-to-port throughput within the switch ASIC at the line rate of the front panel ports. Almost all TOR switches use a single switch ASIC and the industry demanded port-density on a single ASIC, and the manufacturers delivered. The list below shows the 10Gbps port density evolution of the Broadcom StrataXGS product line. The Intel Fulcrum ASIC evolution isn’t shown here but looks very similar.
- Scorpion: 24 x 10Gbps
- Trident: 48 x 10Gbps
- Trident+: 64 x 10Gbps
- Trident2: 108 x 10Gbps (108 x 10G MACs – can handle 1.2Tbps using some ports at 40G).
Arista and Merchant Silicon
Arista Networks’ strategy is to use best-of-breed merchant-silicon switch ASICS in their switches rather than developing custom ASICS in house. Using merchant silicon solves a raft of problems for Arista. The down side is that Arista is limited by the port-count of the current generation of ASICS. But you don’t stop innovating just because you’re using standardised building blocks. When Arista were developing the 7100 series switches they chose the 24 x 10Gbps Intel Fulcrum FM4000 ‘Bali’ ASIC. (simplified diagram to left).
For the 7148SX, Arista needed to build a 48-port non-blocking switch from 24-port ASICs. The obvious choice here is to deploy more 24-port ASICs. So… how many do you need? Deploying two 24-port ASICs does double your total port-count, however you will end up with two logical switches instead of one. To interconnect them you could redeploy some of the front-panel ports as inter-ASIC ports, as per the diagram below.
Okay, that looks a bit better, but it still doesn’t meet requirements. You have only 36 front-panel ports and you need 48. Worse still, you have 3:1 oversubscription on your interconnect. This solution won’t work, so let’s look at the final solution.
Whoah, you need six 24-port ASICs to double the front panel port-count in a non-blocking switch. I was a little surprised that so many ASICs were needed. Note that the architecture has changed too. Arista have used a linecard-and-fabric architecture borrowed from chassis switches, but have collapsed that architecture into a single standalone switch. In order to simply inter-ASIC routing, the front-facing ASICs will most likely add an destination-ASIC tag onto the packet to simplify the fabric-ASIC lookup logic. There are many other multi-ASIC challenges here such as keeping the L3 and L2 tables synced in the front-facing ASICs. Although this seems like overkill for a single-RU box, it is a great strategy if you have fabric programming experience in your software team.
Arista doesn’t need to use this particular trick any more but I love the approach of innovating beyond fixed constraints. This approach is not unique to Arista. In the the Cisco/Insieme Nexus 9000, they have divided the resources of the line-card and fabric Trident2 ASICs in very different and clever ways to get the most from those small table sizes. We’re going to see a whole lot more of this trend. Innovation doesn’t die in the age of merchant silicon, it just takes new forms.
 This assumes perfect traffic distribution and no incast, but is still very different from planned oversubscription.