OSPF – Lingering LSAs from unreachable routers

OSPF – Lingering LSAs from unreachable routers

When a single-homed router is isolated by link failure, the LSAs it had previously originated can live for up to 60 minutes in the OSPF LSDB of the surviving routers. This may not be what you were expecting, and can cause a lot of confusion when troubleshooting OSPF. In this post we’ll look at why LSAs from an isolated router linger and how OSPF still knows how to ‘do the right thing’.

BranchHQ_goodlink

Test topology for an isolated router

I’m using a router-on-a-stick topology in our example network. In a real-world deployment you might have R1 deployed in remote site, connected to R2 which located in the campus core.  I’m using ethernet links and point-to-point OSPF adjacencies to simplify testing and verification.
Before we start breaking things we should baseline the type-1 LSAs of both routers.  We will look R1’s LSDB and we’ll examine the type-1 LSAs for both R1 and R2.

R2#sh ip ospf database router 1.1.1.1
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
LS age: 76
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 1.1.1.1
Advertising Router: 1.1.1.1
LS Seq Number: 80000004
Checksum: 0xE628
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 1.1.1.1
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 2.2.2.2
(Link Data) Router Interface address: 192.168.12.1
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.12.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 1

Both R1 and R2 router LSAs look normal for a point-to-point interconnect; a single stub link for the loopback prefix and both a stub and a point-to-point link to describe the R1-R2 prefix and link. Note the sequence number of both LSAs happen to be the same, 80000004.

R2#sh ip ospf database router 2.2.2.2
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
LS age: 333
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 2.2.2.2
Advertising Router: 2.2.2.2
LS Seq Number: 80000004
Checksum: 0x8481
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 2.2.2.2
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 1.1.1.1
(Link Data) Router Interface address: 192.168.12.2
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.12.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 1
R2#

Lastly, we’ll baseline the routing table.  We have a route to R1’s loopback via OSPF.

R2#sh ip route | i 1.1.1.1
 O 1.1.1.1 [110/2] via 192.168.12.1, 00:07:43, GigabitEthernet1/0

And we have a connected route to describe our R1-R2 link.

R2#sh ip route | i 192.168.12.0
 C 192.168.12.0/24 is directly connected, GigabitEthernet1/0

Link failure & unreachable router

Okay lets break our R1-R2 link, wait for OSPF to time out the adjacency and see what happens.  Assuming R1 was isolated (i.e. not reachable), we’ll troubleshoot everything by looking at the R2 OSPF LSDB and routing table.

R2#conf t
R2(config)#int gi1/0
R2(config)#shutdown

BranchHQ_badlink

Thankfully R1’s loopback prefix of 1.1.1.1/32 has been removed from the routing table.

R2#sh ip route | i 1.1.1.1
R2#

When we look at the LSDB we see that R2 keeps R1’s type-1 LSA in the LSDB.     Look at the R1 LSA below and you should notice the LSA now bears the message: ‘adv router is not-reachable’.

R2#sh ip ospf database router 1.1.1.1
OSPF Router with ID (2.2.2.2) (Process ID 1)
Router Link States (Area 0)
Adv Router is not-reachable
LS age: 130
Options: (No TOS-capability, DC)
LS Type: Router Links
Link State ID: 1.1.1.1
Advertising Router: 1.1.1.1
LS Seq Number: 80000004
Checksum: 0xEA26
Length: 60
Number of Links: 3
Link connected to: a Stub Network
(Link ID) Network/subnet number: 1.1.1.1
(Link Data) Network Mask: 255.255.255.255
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: another Router (point-to-point)
(Link ID) Neighboring Router ID: 2.2.2.2
(Link Data) Router Interface address: 192.168.12.1
Number of TOS metrics: 0
TOS 0 Metrics: 1
Link connected to: a Stub Network
(Link ID) Network/subnet number: 192.168.12.0
(Link Data) Network Mask: 255.255.255.0
Number of TOS metrics: 0
TOS 0 Metrics: 1

If you’re troubleshooting, then ‘adv router is not-reachable’ is the smoking gun.  But remember that this is an internal flag added by cisco.  In R2’s LSDB,  the R1 type-1 LSA is completely unchanged and still includes the R1-R2 point-to-point and stub-network links.
The LSA sequence number for hasn’t even been modified.   R1 never had a chance to send an updated type-1 LSA to R2,  and R2 did not update R1’s type-1 LSA in it’s database when the R1-R2 link failed.

Sleight of hand

If the R1 type-1 LSA is unchanged after the link failure, then how does the R2 OSPF router know enough to mark R1 as unreachable?  We have been focussing on R1’s type-1 LSA but the answer lies in the changes made to R2’s type-1 LSA.
Let’s have a look;

R2#sh ip ospf database router 2.2.2.2
            OSPF Router with ID (2.2.2.2) (Process ID 1)
		Router Link States (Area 0)
  LS age: 632
  Options: (No TOS-capability, DC)
  LS Type: Router Links
  Link State ID: 2.2.2.2
  Advertising Router: 2.2.2.2
  LS Seq Number: 80000005
  Checksum: 0xA474
  Length: 36
  Number of Links: 1
    Link connected to: a Stub Network
     (Link ID) Network/subnet number: 2.2.2.2
     (Link Data) Network Mask: 255.255.255.255
      Number of TOS metrics: 0
       TOS 0 Metrics: 1

We see that the R2 type-1 LSA has dropped it’s stub and point-to-point links for R1-R2. It has also incremented it’s sequence number by 1 to reflect this change.  Well that’s cool, but the R1 type-1 includes the failed link and the R2 type-1 omits the failed link.  So how does OSPF know who to believe?

The topology graph

It took me a little digging to find this, but the OSPF RFC holds the answer.  Section 16.1.2 of RFC 2328 describes the SPF algorithm.  You can read ‘vertex’ here as ‘ospf router’.

Call the vertex just added to the tree vertex V.  Examine the LSA associated with vertex V. This is a lookup in the Area A’s link state database based on the Vertex ID.

In any case, each link described by the LSA gives the cost to an adjacent vertex. For each described link, (say it joins vertex V to vertex W):

Look up the vertex W’s LSA (router-LSA or network-LSA) in Area A’s link state database. If the LSA does not exist, or its LS age is equal to MaxAge, or it does not have a link back to vertex V, examine the next link in V’s LSA.

Okay, so that may be little hard to read, but let’s focus on the text in bold.  R1 can have a link to R2, but unless R2 has a link back to R1 all bets are off; the link is ignored by SPF and if there is no alternative link then that router is removed from the SPF tree.
Here’s another way to look at it if you’re a visual thinker.
IsolatedRouter
 

R2 is not allowed to flush the R1 LSA

It would certainly be easier to troubleshoot if R2 flushed R1’s LSA as soon as it realised that R1 was unreachable.  This is referred to as ‘premature aging’ in the RFC. However, section 14.1 of RFC 2328 prohibits this.

A router may only prematurely age its own self-originated LSAs. The router may not prematurely age LSAs that have been originated by other routers.

If the R1-R2 link had stayed down then it’s type-1 LSA would remain in R2s database for up to one hour, until the MaxAge timer expires.  Seeing these LSAs in the LSDB can be confusing for the first time troubleshooter,  but once you learn to expect this behavior you can avoid jumping to incorrect conclusions.

Sherpa Summary

  • Only the originating router is permitted to prematurely age (flush) an LSA.
  • Sometimes the originating router dies or is isolated before it can prematurely age it’s own prefixes
  • In that case the LSAs from an isolated router stay in the LSDB of surviving routers until the LSA reaches MaxAge(up to 60 minutes), or the originating router becomes reachable again.
  • Remember that the type-1 LSA from the isolated router is stale, and may describe links which are no longer present.  It will be given the internal ‘not-reachable’ status by cisco routers, and the not-reachable message will be visible when you examine the LSDB.
  • The surviving adjacent router will update it’s own type-1 LSA, removing the link to the isolated neighbor and will flood this updated type-1 LSA to all other routers within the area.
  • All surviving routers in the area have identical information and reach the same SFP decision; that R1 is not reachable because there is no back-link from R2 to R1.

4 thoughts on “OSPF – Lingering LSAs from unreachable routers

  1. Hey John, thanks for pointing out that section of RFC2328. It explains the behavior I mentioned in the comment to your previous post.
    I’d instinctively known that when LSAs disagree, then the links are ignored, but hadn’t seen the chapter-and-verse citation before.
    Thanks!

    1. Thanks Chris,
      I’ve been poking through RFC2328 quite a bit recently, and I’m finding that pulling out the supporting text and publishing it gives me a definitive reference to return to.
      I’m preparing a post on that BCAST/P2P adjacency issue but I’m seeing some weird behaviour. For example, the BCAST interface becomes a DROTHER if it has the lower IP address. I’m also seeing a unicast hello from P2P to BCAST interface just before hitting 2-WAY state that I wasn’t expecting. Fun and games.

  2. But when ever the Nbr link goes down the surviving router should flood Router LSA and trigger a complete SPF , which will remove the routes learned from isolated non reachable router.

    1. Hi Johnson,
      I agree. The aim of the blog post is to highlight that whilst the routes are flushed the Learned LSAs are not. OSPF is giving exactly the outcome you would expect, but the LSA’s which remain on R2 needed an explanation.
      /John H

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.