NetworkSherpa

OSPF – Lingering LSAs from unreachable routers

When a single-homed router is isolated by link failure, the LSAs it had previously originated can live for up to 60 minutes in the OSPF LSDB of the surviving routers. This may not be what you were expecting, and can cause a lot of confusion when troubleshooting OSPF. In this post we’ll look at why LSAs from an isolated router linger and how OSPF still knows how to ‘do the right thing’.

Test topology for an isolated router

I’m using a router-on-a-stick topology in our example network. In a real-world deployment you might have R1 deployed in remote site, connected to R2 which located in the campus core.  I’m using ethernet links and point-to-point OSPF adjacencies to simplify testing and verification.
Before we start breaking things we should baseline the type-1 LSAs of both routers.  We will look R1’s LSDB and we’ll examine the type-1 LSAs for both R1 and R2.

R2#sh ip ospf database router 1.1.1.1
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
LS age: 76
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 1.1.1.1
Advertising Router: 1.1.1.1
LS Seq Number: 80000004
Checksum: 0xE628
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 1.1.1.1
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 2.2.2.2
(Link Data) Router Interface address: 192.168.12.1
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.12.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 1

Both R1 and R2 router LSAs look normal for a point-to-point interconnect; a single stub link for the loopback prefix and both a stub and a point-to-point link to describe the R1-R2 prefix and link. Note the sequence number of both LSAs happen to be the same, 80000004.

R2#sh ip ospf database router 2.2.2.2
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
LS age: 333
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 2.2.2.2
Advertising Router: 2.2.2.2
LS Seq Number: 80000004
Checksum: 0x8481
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 2.2.2.2
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 1.1.1.1
(Link Data) Router Interface address: 192.168.12.2
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.12.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 1
R2#

Lastly, we’ll baseline the routing table.  We have a route to R1’s loopback via OSPF.

R2#sh ip route | i 1.1.1.1
 O 1.1.1.1 [110/2] via 192.168.12.1, 00:07:43, GigabitEthernet1/0

And we have a connected route to describe our R1-R2 link.

R2#sh ip route | i 192.168.12.0
 C 192.168.12.0/24 is directly connected, GigabitEthernet1/0

Link failure & unreachable router

Okay lets break our R1-R2 link, wait for OSPF to time out the adjacency and see what happens.  Assuming R1 was isolated (i.e. not reachable), we’ll troubleshoot everything by looking at the R2 OSPF LSDB and routing table.

R2#conf t
R2(config)#int gi1/0
R2(config)#shutdown

Thankfully R1’s loopback prefix of 1.1.1.1/32 has been removed from the routing table.

R2#sh ip route | i 1.1.1.1
R2#

When we look at the LSDB we see that R2 keeps R1’s type-1 LSA in the LSDB.     Look at the R1 LSA below and you should notice the LSA now bears the message: ‘adv router is not-reachable’.

R2#sh ip ospf database router 1.1.1.1
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
Adv Router is not-reachable
LS age: 130
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 1.1.1.1
Advertising Router: 1.1.1.1
LS Seq Number: 80000004
Checksum: 0xEA26
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 1.1.1.1
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 2.2.2.2
(Link Data) Router Interface address: 192.168.12.1
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.12.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 1

If you’re troubleshooting, then ‘adv router is not-reachable’ is the smoking gun.  But remember that this is an internal flag added by cisco.  In R2’s LSDB,  the R1 type-1 LSA is completely unchanged and still includes the R1-R2 point-to-point and stub-network links.
The LSA sequence number for hasn’t even been modified.   R1 never had a chance to send an updated type-1 LSA to R2,  and R2 did not update R1’s type-1 LSA in it’s database when the R1-R2 link failed.

Sleight of hand

If the R1 type-1 LSA is unchanged after the link failure, then how does the R2 OSPF router know enough to mark R1 as unreachable?  We have been focussing on R1’s type-1 LSA but the answer lies in the changes made to R2’s type-1 LSA.
Let’s have a look;

R2#sh ip ospf database router 2.2.2.2
            OSPF Router with ID (2.2.2.2) (Process ID 1)
		Router Link States (Area 0)
  LS age: 632
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 2.2.2.2
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000005
  Checksum: 0xA474
  Length: 36
  Number of Links: 1
    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 2.2.2.2
     (Link Data) Network Mask: 255.255.255.255
      Number of TOS metrics: 0
       TOS 0 Metrics: 1

We see that the R2 type-1 LSA has dropped it’s stub and point-to-point links for R1-R2. It has also incremented it’s sequence number by 1 to reflect this change.  Well that’s cool, but the R1 type-1 includes the failed link and the R2 type-1 omits the failed link.  So how does OSPF know who to believe?

The topology graph

It took me a little digging to find this, but the OSPF RFC holds the answer.  Section 16.1.2 of RFC 2328 describes the SPF algorithm.  You can read ‘vertex’ here as ‘ospf router’.

Call the vertex just added to the tree vertex V.  Examine the LSA associated with vertex V. This is a lookup in the Area A’s link state database based on the Vertex ID.

In any case, each link described by the LSA gives the cost to an adjacent vertex. For each described link, (say it joins vertex V to vertex W):

Look up the vertex W’s LSA (router-LSA or network-LSA) in Area A’s link state database. If the LSA does not exist, or its LS age is equal to MaxAge, or it does not have a link back to vertex V, examine the next link in V’s LSA.

Okay, so that may be little hard to read, but let’s focus on the text in bold.  R1 can have a link to R2, but unless R2 has a link back to R1 all bets are off; the link is ignored by SPF and if there is no alternative link then that router is removed from the SPF tree.
Here’s another way to look at it if you’re a visual thinker.

 

R2 is not allowed to flush the R1 LSA

It would certainly be easier to troubleshoot if R2 flushed R1’s LSA as soon as it realised that R1 was unreachable.  This is referred to as ‘premature aging’ in the RFC. However, section 14.1 of RFC 2328 prohibits this.

A router may only prematurely age its own self-originated LSAs. The router may not prematurely age LSAs that have been originated by other routers.

If the R1-R2 link had stayed down then it’s type-1 LSA would remain in R2s database for up to one hour, until the MaxAge timer expires.  Seeing these LSAs in the LSDB can be confusing for the first time troubleshooter,  but once you learn to expect this behavior you can avoid jumping to incorrect conclusions.

Sherpa Summary