Is CPU or ASIC responsible for forwarding?
I received the question below from reader Ned as a comment on my 24-port ASIC post and thought that the discussion was worth a post of it’s own.
…Would you be able to speak a bit about the actual physical path or packet flow a packet takes inside the switch itself and how does the hardware forwarding take place within the switch and asic. When does packet get sent to the Asic. Is it happen on ingress or on egress? When does packet get analyzed by CPU or control plane. If the CPU never sees the actual packet how does asic know where to forward the packet and does that mean the packets stay within asic itself and is that what is meant to be hardware forwarding. Is Asic = dataplane. Tx
I like this question because it captures a lot of my early assumptions and concerns about data and control plane separation. My response assumes a single-stage modern switch-on-chip ASIC without backplane or fabric.
Population of the forwarding tables
There is so much going on in a router that it can get very confusing. Thankfully the programming of the forwarding tables is (mostly) done before route-able packets arrive. For example an OSPF LinkStateUpdate (LSU) could announce the availability of a new route. The LSU will arrive into the ASIC at an ingress interface just like all data-plane packets. The destination IP will be that of a router IP or an alias (e.g 22.214.171.124) so the ASIC will forward the entire frame to the CPU for processing. This ASIC-to-CPU interface is normally a PCIe bus. The CPU will run it’s routing process and if accepted, it will program the ASIC forwarding tables with this new route.
When people talk about control-plane and data-plane separation they really mean, “the CPU no longer has to do a prefix lookup for each arriving data-plane packet”. I would call that hardware-based forwarding, rather than control and data plane separation. Note that the control-plane packets still need to transit the ASIC on their way to the CPU.
A data-plane packet arrives
In this context, a data-plane packet is any packet that requires routing to a destination other than the CPU. When these packets arrive at the switch they are placed in one or more packet buffers along with a unique ID. The packet headers and the buffer ID are then copied and processed by the ASIC. The ASIC will consult the forwarding tables and find a match for the prefix we learned from OSPF earlier. These forwarding tables are normally stored in TCAM or Reduced latency-DRAM for ridiculously fast lookup times. Once a next-hop port and IP address have been chosen, the router needs to find the MAC address and consults another table.
Cisco calls this an ‘adjacency table’, and it is populated by the ARP process. ARP entries are aggressively aged out of routers by design, so it’s likely that the required MAC address is not present for the chosen next hop. The ASIC has to go begging to the CPU for help. How degrading.. “What’s that ASIC, you need help? Not so independent now are ya?” (this is how I imagine CPUs talk to ASICS).
Cisco call this an ARP ‘glean’. The CPU will send an ARP request (via the ASIC) and process the response, then program the adjacency table. This works well (until you have millions of ARP gleans), but again it’s not pure hardware forwarding. The CPU is ‘assisting’ the hardware forwarding process in real-time.
Finally we have enough information to construct a new shiny new header and attach it to the previously buffered payload. The temporary buffer information is discarded, and the full frame is queued for transmission out the egress port.
Here’s my quick summary:
- Control plane traffic, e.g OSPF, LACP, etc must transit the ASIC before reaching the CPU.
- Control plane packets are sent to the CPU in their entirety, but data-plane packets have their payload buffered while a copy of the header is processed by the ASIC.
- The CPU processes routing updates and programs the routing table into the ASIC route lookup tables.
- The ASIC handles the forwarding decisions, however…
- The ASIC may need to ask the CPU for help when it’s missing an entry in it’s ARP table for the chosen next-hop.
I hope it clarifies things rather than further muddying the waters? Let me know your thoughts in the comments.
18 thoughts on “Is CPU or ASIC responsible for forwarding?”
Hi John- tx for fast reply. I have few more Qs. Is dataplane packet ever physically moving from ingress side port to asic to cpu and then from cpu it goes back to asic and then egress side. If it is always in storage buffer and never moves to and back from cpu as in explanation above then is it stored in memory on egress side of switch then how does packet get stored on egress side when it does not know who is egress interface when it gets packet first time. Tx
In the Switch-on-chip ASIC example above the packet buffer is one large bank of memory.Because all ports (ingress and egress) share the same buffer, there’s no need to ‘move’ the packet anywhere. I.e. when the egress port wants to transmit the frame it has access to the buffer and knows exactly where to find it using the unique_id. The buffer say (9MB) for Broadcom trident is small and maximum buffer limits need to be given to each port for ingress and egress to stop any one port from consuming all the shared memory. What I don’t know is how the memory manager keeps track of a single packet, as it’s headers move through the pipeline. i.e. when does it say that an egress queued buffer is now taking part of the egress port memory allowance instead of the ingress port.
In fabric or stacked switches, the entire frame is switched from the ingress line card to the egress card, but the CPU is still not in the forwarding path. In older datacenter chassis switches, the CPU would get involved into the forwarding decision (centralised forwarding). But even here the SUP/CPU is processing headers and then instructing the ingress line card to send the full frame across the backplane to a specific egress card.
Hope this helps,
When we are speaking about chassis and centralized forwarding we can see two approaches.
1. If we have classic line card (without fabric connection to the backplane) the entire packet is flooded on DBUS from where supervisor (PFC) can take it, make forwarding decision and send back to RBUS from where egress line card would take it and forward to another device.
2. If we have Centralized forwarding wiht Fabric the packet from port ASIC would be forwarded to CFC on ingress line card. The packet header would be stripped and send on DBUS to supervisor (PFC) where decision is made. PFC sends forwarding decision on RBUS where ingress line card receives decision. The result and packet are send on Fabric and finally reaches egress line card.
Great clarifications, thank you.
There is a iMMU [ Ingress Memory Management Unit ] and eMMU [ Egress memory Management Unit ] which is used to track the memory per interface utilization.
Each interface has the max queue depth in modern gears which opposed to shared interface memory usages in chipsets like trident [ 9 MB ] and trident+ [ 12 MB ] where the memory is shared.
There are two types of memory in the device . One is search based memory [ TCAM memory ] where there is LEM , LPM and shared programmable hash table called as UFT [ Unified forwarding table ] or SMT [ Shared Memory table ] in Arista 7500E o r 7280QRC36 or FFT [ Flexible Forwarding Table ] in Cisco 9500. Packet Buffer Memory is used to store incoming data plane traffic in on chip or off chip buffer based on the size of the traffic and also the buffer size . On chip is around 12-16 MB & Off chip is around 4 GB of traffic.
Modern day chipsets hold Off chip Packet buffer memory [ PBM ] & External TCAM memory for search or lookup memory.
Regarding CPU , there is a CPU in Line card as well in addition to ASIC [ Application Specific Integrated Chip ]. Line card CPU offloads the ASIC work by programming the ASIC . There are agents or daemons or manager such as Routing daemon or Forwarding daemon. In Arista its called as RIB agent , Ethernet Bridging agent [EBra ],
We also run Line card only protocols such as BFD & OAM which doesn’t sit on CPU on Supervisor Module.
LineCard ASIC does all the work. In TOR type of gears such as 3064 or 7280QR C36 we don’t have Line card CPU where ASIC does the work of CPU. In modular chassis we have ASIC as well as Line Card CPU where we run Line card only protocols and offload programming ASICs from Supervisor CPU.
Nice. I didn’t think carefully about the ‘ARP’ issue. Keep it going.
Thanks for the comment, hope you’re keeping well.
To add some context, ARP punts only happen when there’s no current ARP entry in the adjacency table. Once a next-hop MAC has been resolved, that ARP entry sits in the table for anywhere from a few minutes to a few hours, depending on the ARP expiration timer. So while ARP punts happen fairly often, they typically comprise a miniscule percentage of the overall traffic volume.
One of the edge cases is where you have a sender who continues to send a high volume of traffic toward a host which is no longer on the network. In this instance the CPU will be frequently interrupted to ask the same ‘who has IP X?’ question, ad infinitum. This is really bad because it could push you to hit an ARP rate limit, where legitimate ARPs for active hosts are impacted. Features like ‘negative arp’ or ‘ARP Glean Throttle’ in Cisco parlance are needed to cache a null entry against that IP in hardware to prevent the repetitive questions.
really cool explanation, specially the begging part 🙂
I don’t normally give hardware ‘talking roles’ but I was a bit tired. Glad you liked it and thanks for the comment.
One thing I never considered is whether there is unified clocking when you have multiple asics talking to each other. My guess is, no… I’d expect each asic to have their own clock and the bus between them is clocked on the rx side and detected on the tx side. I’d guess that there is no good reason to have a single clock for multiple asics. Just a guess.
I haven’t really thought that through either. But, like you, I doubt that there is any global clocking sync when there’s so much clock recovery taking place.
One additional comment is that some platforms like the Cisco ASR9K have dedicated CPUs per line card in addition to the CPU on the central controller card. The punt from the Network Processor to the CPU still happens but ARP and other functions like ICMP, BFD, Netflow etc. are handled on the linecard-local CPU.
MAN, well Explained, seriously best explanation for this topic … Thanks again!
Thanks Ahmed, much appreciated.
Awesome explanation! Thanks a ton.
ARP glean part is very clear and helped a lot, thank you very much for this wonderful blog!