Open Shortest Path First Introduction Thomas Bellman https://www.nsc.liu.se/~bellman/ospf-introduction-20181004.txt What is a routing protocol? =========================== Automatically distribute routing information between routers in a network. Always route packets the "best" way, even when the network topology changes (links cut, routers dying). New networks need only be configured where they are attached, not on every router in the network. Example topology ================ NetA NetH \ / R1-------R6 / \ / \ NetB / \ NetG \ / \ / R2 R5 / \ / \ NetC \ / NetF \ / \ / R3-------R4 / \ NetD NetE Static configuration ==================== NetA NetH Config on R1: \ / R1----R6 R1# route add NetB gw R2 NetB / \ NetG R1# route add NetC gw R2 \ / \ / R1# route add NetD gw R2 R2 R5 R1# route add NetE gw R6 / \ / \ R1# route add NetF gw R6 NetC \ / NetF R1# route add NetG gw R6 R3----R4 R1# route add NetH gw R6 / \ NetD NetE Static configuration ==================== NetA NetH Configs to reach NetA: \ / R1----R6 R2# route add NetA gw R1 NetB / \ NetG R3# route add NetA gw R2 \ / \ / R4# route add NetA gw R3 R2 R5 R5# route add NetA gw R6 / \ / \ R6# route add NetA gw R1 NetC \ / NetF R3----R4 / \ NetD NetE Static configuration ==================== Does not scale. Even a small handful of routers and networks become bothersome to manage. Large networks have thousands of routers, and can add and remove networks and connections many times per day. And importantly, if R2 goes down, R3 and R4 looses connectivity to NetA, and require reconfiguration to go the other way in the ring. Routing protocol ================ Routers tell their neighbours about directly connected networks, and they in turn tell their neighbours, and so on, until every router knows how to reach every network. R3 gets information about NetA from both R2 and R4, and can select the "best" route. Routing vs bridging - Resiliency ================================ * Broadcast domain = failure domain. Keep failure domains small. * Loop resistance. BUM flooding in Ethernet amplifies a loop, while IP drops packets with unknown destinations. * Loop resilience. IP has a TTL/Hop Count, while Ethernet will loop packets forever. * Spanning Tree problems: - STP ignores VLANs; can block all the links where a VLAN exists. - STP on links to hosts => slow to recover after host starts. - STP can take 30s or more until reachability fully restored, due to MAC caches. - Active-Passive. Problems with blocked links are not noticed until the active link fails. Routing vs bridging - Bandwidth =============================== * Spanning Tree blocks all paths except one to each destination. Only a single active uplink from each leaf switch. * IP handles Equal Cost Multi-Pathing fine, and allows you to build spine-and-leaf (folded Clos) networks with lots of bandwidth. Routing vs bridging - Visibility ================================ * Traceroute allows you to find out the path packets take, and see where e.g. packet drops happen. * Ethernet tracing tools exists, but only work for pure layer 2 paths; any L3 hop will stop those. Routing vs bridging - TRILL and SPB =================================== TRILL and SPB solve some of the problems with Spanning Tree, but generally not as well as IP, and are not widely available. IGP vs EGP ========== Protocols classified based on usage * Interior Gateway Protocols - Used within an organization - Routers trust each other fully - Routing based only on costs (link speeds, ...) * Exterior Gateway Protocols - Used between organizations - Routers don't trust each other - Complex policies (economics, politics) IGPs: RIP, OSPF, IS-IS, EIGRP EGPs: BGP Not a hard distinction. E.g, some use BGP as IGP (common in large data centers). Distance-Vector vs Link-State ============================= Technical difference between protocols, based on how they distribute routing information. Distance-Vector: RIP, EIGRP, BGP Link-State: OSPF, IS-IS (BGP is actually "Path-Vector".) (And there are extensions to BGP to distribute link state information.) Distance-Vector protocols ========================= "Routing by rumour" R1 -> R2 : NetA at distance 1 R1 -> R6 : NetA at distance 1 R2 & R6 calculate their best route to NetA R2 -> R3 : NetA at distance 2 R6 -> R5 : NetA at distance 2 R3 & R5 calculate their best route to NetA R3 -> R4 : NetA at distance 3 R5 -> R4 : NetA at distance 3 Each router tells its neighbours *its* (limited) view of reality. Distance-Vector (cont) =============== Each router knows only its immediate neighbours; everything beyond them is a black box. Problem: Each router has to calculate its own best way to NetA before it can tell its neighbours; delay in propagation of changes. Problem: If R1 goes down, R2 will realize this (link down, no periodic hellos). But R3 does not, and will tell R2 that it has a route to NetA with distance 3 (even though that route is via R2). This leads to routing loops. Various ways around this. Not for this presentation. Link-state protocols ==================== * R1 -> R2,R6 : I (R1) exist. : I (R1) have NetA locally connected, cost 5 * R2 -> R1 : I (R2) exist. : I (R2) have NetB locally connected, cost 5 : I (R2) have NetC locally connected, cost 5 * R2 -> R3 : I (R2) exist. : I (R2) have NetB locally connected, cost 5 : I (R2) have NetC locally connected, cost 5 : R1 exists. : R1 has NetA locally connected, cost 5 : I (R2) have direct connection to R1, cost 10 Link-state protocols (cont.) ==================== * R3 -> R4 : I (R4) exist. : I (R3) have NetD locally connected, cost 5 : R1 exists. : R2 exists. : R2 has direct connection to R1, cost 10 : R1 has NetA locally connected, cost 5 : R2 have NetB locally connected, cost 5 : R2 have NetC locally connected, cost 5 : I (R3) have direct connection to R2, cost 10 Link-state protocols ==================== All routers build up a view of the full topology of the entire network. Routers pass on updates to its neighbours without waiting for its own calculation of best next-hops. Quicker propagation. Less risk of routing loops, as all routers know the full topology of the network. But can actually form (temporary) loops when links are *added*... Link-state protocols (cont) ==================== Full topology information at each router => requires more memory, and gives complicated calculation of best next-hop. Processing full topology in a large network can take a long time; incremental algorithms mitigate. Flaps are distributed throughout the network; does routers in Stockholm care about flapping links in the Kuala Lumpur office? All routers get the same view of the network. Thus not possible to implement policy, or to filter traffic. Aside: Loopbacks ================ Routers have many IP addresses, and those can change as connections are added/removed during lifetime of the router. Which is the canonical address to use for ssh, SNMP, et.c? Also: When link is lost on a port on a router, IP addresses configured on that port disappear. (Returns when link comes back.) Makes it even more difficult to have a canonical address. Solution: Add an IP address to the loopback interface, and advertise into IGP. Loopback interface is always up (as long as router is). OSPF version 2 and version 3 ============================ OSPF v2 is IPv4 *only*. IPv4 addresses are embedded everywhere in the protocol. OSPF v3 is nominally protocol agnostic. Standards exists for doing both IPv4 and IPv6 in OSPF v3. But: - Many only implement the IPv6 parts - Needs to be run on top of IPv6 - No interop between v2 and v3 In practice: - Use OSPF v2 for IPv4 - Use OSPF v3 for IPv6 OSPF v3 uses IPv6 link-local addresses for all communication (except virtual links). Router-id ========= Routers in OSPF are identified by a 32-bit number. Must be unique within the AS, but has no other semantics. Usually written as a dotted quad, i.e. like an IPv4 address. Even in OSPF v3 for IPv6. 732303975 is written as 43.166.18.103. Conventionally use loopback IPv4 address as router-id. Most implementations do this by default. Strange effects if two routers have the same router-id. Routes appear, disappear, appear again, and so on. Adjacency ========= Routers regularly broadcasts (multicasts) HELLO packets on links. Receives HELLO from other router, sends back HELLO of its own. HELLO packets list known neighbours on that link. Seeing itself in the HELLO from a neighbour acts as ACK. Establish adjacency, then exchange "link state advertisments", until they know the other router knows everything it knows. Important parameters ==================== * Hello interval How often HELLO packets are sent. Default 10s. * Dead interval Time with no HELLOs received from a neighbour until considered dead. Default 40s or 4×hello interval (varies between implementations). * Authentication. * Area id (described later). Adjacency is only established if routers have the same hello interval, same dead interval and same area. Link State Advertisments ======================== Several types of LSAs. Types differ between OSPFv2 and OSPFv3. * Router - Neighbouring routers - OSPFv2: Locally connected network prefixes * Intra Area Prefix (OSPFv3 only) - Locally connected network prefixes * External - Routes leading out from the OSPF AS; static routes, routes learned from BGP, RIP, et.c * Network Summary (v2) / Inter Area Prefix (v3) - Network prefixes from other areas And several more. Flooding ======== LSAs learned from one router, are passed on to all other neigbouring OSPF routers. Which in turn pass them on to its neigbours, and so on, until all routers have all LSAs LSAs are stored in a non-persistent "database" on each router, the "Link State DataBase". Link State Advertisments (cont) ======================== LSAs are identified by an 32 bit LSA id, router-id of advertising (originating) router, and sequence number. When something changes (new/lost adjacency, new or lost local network, cost change, et.c), router will send out new LSA with same id, but higher sequence number. Lifetime of an LSA is 1 hour (3600 seconds). Must be re-advertised (with higher seq.no) before that by originating router. In OSPF v2, LSA id doubles as IP adress for what is announced in the LSA. In OSPF v3, the LSA id does not have any addressing semantics; announced addresses have a separate field in the LSAs. Shortest Path First =================== When LSAs have been learned, routers calculate full topology of network, and best route through the network for each prefix (using Dijkstra's algorithm). "Shortest" path is path where sum of costs is the smallest. Costs are configured per interface, or derived automatically from interface bandwidth. All routers will reach the same conclusion. *Must* do so, or routing will break. SPF complexity scales as O(L + R * log R), where L is number of connections between routers, and R is number of routers. Passive interfaces ================== Prefixes on interfaces where you run OSPF are automatically announced as locally connected (in Router LSA or Intra Area Prefix LSA). For prefixes on interfaces where you *don't* want to run OSPF, declare the interface as "passive" in the OSPF config. An alternative is to *redistribute* connected networks to OSPF, but then they will show up as AS External LSAs. Designated Router ================= Optimization for many routers connected to a single virtual 10Base5 coax cable (aka switch): R1 R2 R3 R4 R5 R6 R7 R8 R9 R10 | | | | | | | | | | --o---o---o---o---o---o---o---o---o---o-- One router is elected as Designated Router, one as Backup DR. Adjacencies formed only with DR and BDR. DR creates a Network LSA to represent this. Configurable priority for DR election. Highest router-id wins in case of ties. No preemption of already elected DR and BDR. Designated Router (cont) ================= Not a common network design these days, as router ports are cheap. (But sometimes still useful.) OSPF still defaults to do this on Ethernet. Initial election takes dead-interval time, making it slow to form adjacencies after link restored. Explicitly configuring OSPF interface as point-to-point avoids this. Recommeded! WARNING: Mismatching link types lead to adjacency being established, and LSAs exchanged, but routing not working... Can be difficult to diagnose. 'show ospf interface XXX' shows DR elected, but no BDR. Scaling and Areas ================= SPF calculation is "expensive". Flooding of LSAs in a densly connected network (e.g. Clos network) can also limit scaling. With modern routers, 100-300 routers is not a problem, but 1000 is probably pushing the limit. OSPF's solution: Split network into "areas". - SPF is only calculated within an area. - Router LSAs, Intra Area Prefix LSAs, and some others, are only flooded within an area. OSPF acts as Link-State within an area, but as Distance-Vector between areas. Areas ===== An area consists of a number of *networks*, with routers connecting them. A router with interfaces in multiple areas is an Area Border Router (ABR). ABR injects Network Summary LSAs into an area based on LSAs in other areas it connects to. AS-external LSAs are flooded across area borders. Area-id ======= Areas are identified by a 32-bit number. Area id is usually written as a dotted quad, like an IPv4 address, but there is no semantics to it. Area id is included in HELLO; adjacencies are not formed if area id differs. Backbone Area ============= Area 0 (0.0.0.0) is special: the backbone area. Areas *must* connect to the backbone area, and must *not* connect to other areas, to avoid loops. Area 0 must not be partitioned, or things will break. Other areas can be partitioned with no ill effects. (Can even have all non-backbone areas share the same id; as long as they don't share an ABR they will be separate. Stupid design to do so, though.) Areas (cont) ===== Intra-area routes are selected before inter-area routes for the same prefix (and inter-area routes before AS-external routes). Stub areas ========== * Stub area - AS-external routes flooded into stub area are summarized as a default route. - Can't have ASBRs of its own. * Totally stubby area - A stub area where inter-area routes are also summarized as a default route. * Not-So-Stubby-Area (NSSA) - AS-external routes flooded into it are summarized as a default route. - But can have ASBRs of its own. All interfaces in an area must be configured to have the same area type, making it difficult to convert. OSPF Disadvantages ================== * No policies or filtering; all routers are expected to forward any packet. - Although filtering between areas is possible, at least in some implementations. Not easy to get right, though. * Not implemented or may require license on smaller (1 Gbit/s) switches, e.g many ProCurve switches, several Juniper EX switches, Mellanox, Cisco SG switches. Junos config example (interfaces) ==================== interfaces { lo0 { unit 0 { family inet { address 198.51.100.17/32; } } } xe-0/0/7 { unit 0 { family inet { address 192.0.2.161/30; } } } } Junos config example (router-id) ==================== routing-options { router-id 198.51.100.17; } Junos config example (OSPF v2 protocol) ==================== protocols { ospf { area 203.0.113.0 { interface xe-0/0/7.0 { hello-interval 5; dead-interval 20; interface-type p2p; metric 300; authentication { md5 1 key "SECRET"; } } interface lo0.0 { passive; } } } } Junos config example (OSPF v3 protocol) ==================== protocols { ospf3 { area 203.0.113.0 { interface xe-0/0/7.0 { hello-interval 5; dead-interval 20; interface-type p2p; metric 300; } interface lo0.0 { passive; } interface xe-0/3/19.0; } } } HP ProCurve config example ========================== key-chain "xyzzy" key-chain "xyzzy" key 1 key-string "SECRET" ip router-id 198.51.100.18 router ospf area 203.0.113.0 enable interface loopback 1 ip address 198.51.100.18 ip ospf 198.51.100.18 area 203.0.113.0 vlan 3705 ip address 192.0.2.162 255.255.255.252 ip ospf 192.0.2.162 area 203.0.113.0 ip ospf 192.0.2.162 cost 300 ip ospf 192.0.2.162 hello-interval 5 ip ospf 192.0.2.162 dead-interval 20 ip ospf 192.0.2.162 network-type point-to-point ip ospf 192.0.2.162 md5-auth-key-chain "xyzzy" Dell DNOS 9 config example ========================== router ospf 1 network 192.0.2.0/24 area 203.0.113.0 network 198.51.100.0/24 area 203.0.113.0 passive-interface default no passive-interface Vlan 3706 fast-converge 1 interface Loopback 0 ip address 198.51.100.19/32 no shutdown interface Vlan 3706 ip address 192.0.2.164/31 ip ospf cost 300 ip ospf hello-interval 1 ip ospf dead-interval 4 ip ospf network point-to-point no shutdown Virtual Routing and Forwarding ============================== Split your router into multiple virtual routers, each with its own address space and routing tables (RIB and FIB). Separate networks that should not communicate with each other, e.g. management network from normal Internet. Bind L3 interfaces to a VRF; traffic arriving on a specific interface will be routed according to FIB in the VRF it is bound to. Need one OSPF (or BGP, RIP, et.c) instance in each VRF on the router. VRF (cont) ========== Needs to separate the traffic on the wire as well. Nothing in the IP packets themselves saying which VRF they belong to (unlike VLANs on the Ethernet layer). Some encapsulations: - physically separate cables - VLAN tagging - MPLS tunnels - GRE tunnels - IPSEC tunnels Subinterfaces ============= Share physical port between VRFs by using VLAN (802.1Q) tagging. Port is still in layer 3 mode, *not* bridging. Ethernet BUM packets (Broadcast, Unknown unicast, Multicast) thus not forwarded to other ports. Can use same .1Q id on several ports, without forming a contiguous VLAN. Can thus use same .1Q tag for all logical links belonging to the same VRF, not needing to allocate separate VLAN id for every logical link. Supported in e.g. Junos, Comware, Cisco IOS, Cisco NX-OS. But not in e.g. HP ProCurve, Dell DNOS, Extreme. Subinterfaces (Junos) ===================== interfaces { lo0 { unit 0 { family inet { address ...; } } unit 9 { family inet { address ...; } } } xe-0/0/7 { vlan-tagging; unit 1 { vlan-id 1; family inet { ... } } unit 9 { vlan-id 9; family inet { ... } } } xe-0/0/8 { vlan-tagging; unit 1 { vlan-id 1; family inet { ... } } unit 9 { vlan-id 9; family inet { ... } } } } Subinterfaces (Junos) ===================== routing-instances { LHC-OPN { instance-type virtual-router; interface lo0.9; interface xe-0/0/7.9; interface xe-0/0/8.9; protocols { ospf { area 203.0.113.0 { interface lo0.9 { passive; } interface xe-0/0/7.9 { ... } interface xe-0/0/8.9 { ... } } } } } } Subinterfaces (ComWare) ======================= ip vpn-instance lhc-opn description VRF for LHC-OPN. interface FortyGigE1/1/7 port link-mode route interface FortyGigE1/1/7.1 ip address ... ospf ... ... interface FortyGigE1/1/7.9 ip binding vpn-instance lhc-opn ip address ... ospf ... ... Unnumbered links ================ * IPv4: "Borrow" address from another interface (typically loopback) as a /32. * IPv6: Use only link-local address on interface. Let routing protocol discover the other end, and automatically set up host route for it. No need for allocating /31 or /127 for every routing link, adding to DNS, and configure on the routers. Downlinks from all spine routers identical. All uplinks on each leaf router identical. Unnumbered links ================ interfaces { xe-0/0/7 { unit 0 { family inet { unnumbered-address lo0.0; } family inet6; } } } Not everyone supports this, but at least Junos, Cisco IOS, Cisco NXOS, HPE Comware, Cumulus, Arista EOS. "Some restrictions may apply." Not supported in: HP ProCurve, Dell DNOS, Extreme. RIB vs FIB ========== Many network operating systems differentiate between Routing Information Base and Forwarding Information Base. RIB is used as communication channel between routing protocols in router. Each protocol calculate routes, and insert the best into RIB. Same prefix can exist several times in RIB, from multiple sources. Lots of metadata. FIB is the routing table that is actually used for forwarding packets (e.g. programmed into hardware). For every prefix in RIB, one route is selected as the best and copied to FIB. Redistributing from/to other protocols ====================================== When a router receive routes from some other protocol (e.g. BGP, or static routes), you may need to *redistribute* those into OSPF, so other routers in OSPF-AS know how to reach those. Common use-case: default route from ISP via BGP. Alternatively, you can redistribute routes from OSPF to e.g. BGP. Be careful! Limit which routes gets redistri- buted. Rare to redistribute *all* routes from one protocol to another. Reading more ============ "OSPF and IS-IS -- Choosing an IGP for Large-Scale Networks" Jeff Doyle Addison-Wesley, ISBN 978-0-321-16879-4