Ternary Content Addressable Memory, or TCAM, is a critical component of a modern router. It is a powerful and fast hardware lookup engine for IP Prefixes. It is also complex, expensive and power hungry. Not surprisingly, there never seems to be never enough on whatever system you use.
TCAM has historically been used to perform hardware-table-lookups of Access-list, Netflow or QoS tables in routers and switches. Most of these implementations from your favorite brand name vendors use one or more external TCAM chips to perform these lookups.
Modern switch-on-chip ASICS such as the Broadcom Trident or Trident+ also use TCAM but normally the TCAM space is on-chip. Having TCAM internal to the ASIC burns a lot of transistors and thus TCAM size is limited. This keeps costs low for the ASIC manufacturer, but also introduces engineering constraints for the router which can catch you out if you don’t pay attention.
Commodity ASICS also use TCAM within the forwarding engine for Longest Prefix Match routing lookups. This rest of this post will describe how TCAM works and the operation of TCAM in the forwarding engine.
A quick look at CAM
Before we dive into TCAM, let’s take a look at how CAM works. Content Addressable Memory (CAM) allows you to input a search ‘word’ (or binary IP address in our example) and search the entire memory in a single operation returning one (or more) matches. To contrast, standard RAM takes a memory location as an input and returns the cells contents at those addresses.
CAM a special type of RAM which combines both storage cells and the comparison logic-gates. This makes CAM much more complex and expensive to produce than standard RAM. CAM performs ‘exact’ matches of input words and excels at performing lookups for one-to-one mappings, like IP-to-MAC and MAC-to-Interface lookup tables.
However, CAM isn’t well suited to prefix lookups, as it can only perform exact matches. This causes headaches when you want to lookup an ip address against a prefix. Your OS could expand all elements of a subnet into individual IP addresses before programming your CAM. However entering all these exact matches would quickly exhaust your limited CAM capacity, and would negate many of the benefits of resource conservation.
Ternary Content Addressable Memory (TCAM) is a more flexible type of CAM and allows you to store a full prefix/mask.
The Terrnary in TCAM refers to the three states which can exist in a TCAM cell. The ‘Terrnary’ in TCAM refers to the third state which can exist in a TCAM cell. Regular (Binary) CAM stores two states ‘1’ and ‘0’. TCAMs can store a ‘don’t-care’ bit and require twice the number of logic gates to handle this more complex state and 3-way decision.
Let’s work through a simplified example. The example below is for a 4 entry TCAM, with each entry storing a 4-bit word. We’ll use 4-bit pseudo-IP addresses in this example. The subnet bits are programed into TCAM and the host-bits get programmed as ‘X’ as we ‘don’t-care’ about matching them. The last entry is all X’s to represent a default route. We’ll pretend we’re looking to lookup pseudo-IP 1011 for routing in this example.
The diagram below shows 4 cells programmed with the 4-bit words:
The match lines are all charged with a ‘high’ voltage before the lookup begins. The search line driver encodes the search word onto the sense lines, which are charged as ‘1’ or ‘0’. It is this need to power every match line, and waste most of the energy on non-matches, that makes TCAM so power hungry. Each cell compares it’s stored value with that of the search word.
A single bit miss will pull down the entire match line and the results is 0 volts out (no match). If there is a don’t care entry in the cell it will always match, i.e. it will not pull down supply lines. The sense amps on the right signal a ‘match’ when it senses a high voltage.
So the result is multiple matches. If the encoder has multiple matches, it will pick the first match line (00) in this case, and will output an encoded version of that line number. The encoders output can be used to address another forwarding table and complete the forwarding task. So the final outcome was that the TCAM/Encoder duo ‘picked’ the longest prefix from the three matches it found. Cool!
But….. the encoder didn’t know that 101/3 was the longest, only that it was the the match with the lowest match line number; the first match. If you review the original prefixes you’ll see the the deck was stacked. That is, the prefixes were pre-sorted, with the longest prefixes programed into the TCAM first and the default programmed in last.
CAM still has a role in network devices for fast exact-match lookups, but isn’t well suited to prefix-lookups. TCAM excels at prefix lookups and the encoder plays an important role in choosing a single match.
The crucial point however is that TCAM and the encoder can only perform a ‘first-match’ against a given prefix. But… if you pre-sort the forwarding table by prefix-length before TCAM programming, the resultant lookup becomes a Longest Prefix Match (LPM).
I want to learn more about how the sorting requirement impacts router performance. Does the Router OS pack all the sorted prefixes tightly in the table with no slots free? What would happen if there was a flapping /30 in your network. Would that would mean that all prefixes from /30 and shorter have to be constantly sorted and re-inserted into the TCAM? That seems a little inefficient to me, but carving the TCAM into banks based on TCAM size seems inflexible also.
Does any reader know how this forwarding-table TCAM sorting/programming problem is sorted in the real world?
I borrowed heavily from: http://www.pagiamtzis.com/cam/camintro/ when preparing this post. It goes into a lot more detail on TCAM, and is worth bookmarking.