Open Shortest Path First


                 Introduction


       Thomas Bellman <bellman@nsc.liu.se>


https://www.nsc.liu.se/~bellman/ospf-introduction-20181004.txt


What is a routing protocol?
===========================

Automatically distribute routing information
between routers in a network.

Always route packets the "best" way, even when
the network topology changes (links cut, routers
dying).

New networks need only be configured where they
are attached, not on every router in the network.


Example topology
================

             NetA         NetH
               \           /
                R1-------R6
               /           \
              /             \
       NetB  /               \  NetG
          \ /                 \ /
           R2                 R5
          / \                 / \
       NetC  \               /  NetF
              \             /
               \           /
                R3-------R4
               /           \
             NetD         NetE


Static configuration
====================

    NetA      NetH         Config on R1:
      \        /
       R1----R6            R1# route add NetB gw R2
NetB  /        \  NetG     R1# route add NetC gw R2
   \ /          \ /        R1# route add NetD gw R2
    R2          R5         R1# route add NetE gw R6
   / \          / \        R1# route add NetF gw R6
NetC  \        /  NetF     R1# route add NetG gw R6
       R3----R4            R1# route add NetH gw R6
      /        \
    NetD      NetE


Static configuration
====================

    NetA      NetH         Configs to reach NetA:
      \        /
       R1----R6            R2# route add NetA gw R1
NetB  /        \  NetG     R3# route add NetA gw R2
   \ /          \ /        R4# route add NetA gw R3
    R2          R5         R5# route add NetA gw R6
   / \          / \        R6# route add NetA gw R1
NetC  \        /  NetF
       R3----R4
      /        \
    NetD      NetE


Static configuration
====================

Does not scale.  Even a small handful of routers
and networks become bothersome to manage.

Large networks have thousands of routers, and can
add and remove networks and connections many times
per day.

And importantly, if R2 goes down, R3 and R4 looses
connectivity to NetA, and require reconfiguration
to go the other way in the ring.


Routing protocol
================

Routers tell their neighbours about directly
connected networks, and they in turn tell their
neighbours, and so on, until every router knows
how to reach every network.  R3 gets information
about NetA from both R2 and R4, and can select the
"best" route.


Routing vs bridging - Resiliency
================================

* Broadcast domain = failure domain.
  Keep failure domains small.
* Loop resistance.  BUM flooding in Ethernet
  amplifies a loop, while IP drops packets with
  unknown destinations.
* Loop resilience.  IP has a TTL/Hop Count, while
  Ethernet will loop packets forever.
* Spanning Tree problems:
  - STP ignores VLANs; can block all the links
    where a VLAN exists.
  - STP on links to hosts => slow to recover
    after host starts.
  - STP can take 30s or more until reachability
    fully restored, due to MAC caches.
  - Active-Passive.  Problems with blocked links
    are not noticed until the active link fails.


Routing vs bridging - Bandwidth
===============================

* Spanning Tree blocks all paths except one to
  each destination.  Only a single active uplink
  from each leaf switch.
* IP handles Equal Cost Multi-Pathing fine, and
  allows you to build spine-and-leaf (folded Clos)
  networks with lots of bandwidth.


Routing vs bridging - Visibility
================================

* Traceroute allows you to find out the path
  packets take, and see where e.g. packet drops
  happen.
* Ethernet tracing tools exists, but only work for
  pure layer 2 paths; any L3 hop will stop those.


Routing vs bridging - TRILL and SPB
===================================

TRILL and SPB solve some of the problems with
Spanning Tree, but generally not as well as IP,
and are not widely available.


IGP vs EGP
==========
Protocols classified based on usage

* Interior Gateway Protocols
  - Used within an organization
  - Routers trust each other fully
  - Routing based only on costs (link speeds, ...)

* Exterior Gateway Protocols
  - Used between organizations
  - Routers don't trust each other
  - Complex policies (economics, politics)

IGPs:  RIP, OSPF, IS-IS, EIGRP
EGPs:  BGP

Not a hard distinction.  E.g, some use BGP as
IGP (common in large data centers).


Distance-Vector vs Link-State
=============================

Technical difference between protocols, based on
how they distribute routing information.

Distance-Vector:  RIP, EIGRP, BGP
Link-State:       OSPF, IS-IS

(BGP is actually "Path-Vector".)
(And there are extensions to BGP to distribute
link state information.)


Distance-Vector protocols
=========================
"Routing by rumour"

  R1 -> R2 : NetA at distance 1
  R1 -> R6 : NetA at distance 1

  R2 & R6 calculate their best route to NetA

  R2 -> R3 : NetA at distance 2
  R6 -> R5 : NetA at distance 2

  R3 & R5 calculate their best route to NetA

  R3 -> R4 : NetA at distance 3
  R5 -> R4 : NetA at distance 3

Each router tells its neighbours *its* (limited)
view of reality.


Distance-Vector (cont)
===============
Each router knows only its immediate neighbours;
everything beyond them is a black box.

Problem:
Each router has to calculate its own best way
to NetA before it can tell its neighbours; delay
in propagation of changes.

Problem:
If R1 goes down, R2 will realize this (link
down, no periodic hellos).  But R3 does not,
and will tell R2 that it has a route to NetA
with distance 3 (even though that route is
via R2).  This leads to routing loops.
Various ways around this.  Not for this
presentation.


Link-state protocols
====================

* R1 -> R2,R6
  : I (R1) exist.
  : I (R1) have NetA locally connected, cost 5

* R2 -> R1
  : I (R2) exist.
  : I (R2) have NetB locally connected, cost 5
  : I (R2) have NetC locally connected, cost 5

* R2 -> R3
  : I (R2) exist.
  : I (R2) have NetB locally connected, cost 5
  : I (R2) have NetC locally connected, cost 5
  : R1 exists.
  : R1 has NetA locally connected, cost 5
  : I (R2) have direct connection to R1, cost 10


Link-state protocols (cont.)
====================

* R3 -> R4
  : I (R4) exist.
  : I (R3) have NetD locally connected, cost 5
  : R1 exists.
  : R2 exists.
  : R2 has direct connection to R1, cost 10
  : R1 has NetA locally connected, cost 5
  : R2 have NetB locally connected, cost 5
  : R2 have NetC locally connected, cost 5
  : I (R3) have direct connection to R2, cost 10


Link-state protocols
====================
All routers build up a view of the full topology
of the entire network.

Routers pass on updates to its neighbours without
waiting for its own calculation of best next-hops.
Quicker propagation.

Less risk of routing loops, as all routers know
the full topology of the network.

But can actually form (temporary) loops when links
are *added*...


Link-state protocols (cont)
====================
Full topology information at each router =>
requires more memory, and gives complicated
calculation of best next-hop.

Processing full topology in a large network can
take a long time; incremental algorithms mitigate.

Flaps are distributed throughout the network; does
routers in Stockholm care about flapping links in
the Kuala Lumpur office?

All routers get the same view of the network.
Thus not possible to implement policy, or to
filter traffic.


Aside: Loopbacks
================
Routers have many IP addresses, and those can
change as connections are added/removed during
lifetime of the router.  Which is the canonical
address to use for ssh, SNMP, et.c?

Also: When link is lost on a port on a router,
IP addresses configured on that port disappear.
(Returns when link comes back.)  Makes it even
more difficult to have a canonical address.

Solution: Add an IP address to the loopback
interface, and advertise into IGP.  Loopback
interface is always up (as long as router is).


OSPF version 2 and version 3
============================
OSPF v2 is IPv4 *only*.  IPv4 addresses are
embedded everywhere in the protocol.

OSPF v3 is nominally protocol agnostic.
Standards exists for doing both IPv4 and IPv6
in OSPF v3.  But:
- Many only implement the IPv6 parts
- Needs to be run on top of IPv6
- No interop between v2 and v3

In practice:
- Use OSPF v2 for IPv4
- Use OSPF v3 for IPv6

OSPF v3 uses IPv6 link-local addresses for all
communication (except virtual links).


Router-id
=========
Routers in OSPF are identified by a 32-bit
number.  Must be unique within the AS, but
has no other semantics.

Usually written as a dotted quad, i.e. like
an IPv4 address.  Even in OSPF v3 for IPv6.
732303975 is written as 43.166.18.103.

Conventionally use loopback IPv4 address as
router-id.  Most implementations do this by
default.

Strange effects if two routers have the same
router-id.  Routes appear, disappear, appear
again, and so on.


Adjacency
=========
Routers regularly broadcasts (multicasts) HELLO
packets on links.

Receives HELLO from other router, sends back
HELLO of its own.

HELLO packets list known neighbours on that link.
Seeing itself in the HELLO from a neighbour acts
as ACK.

Establish adjacency, then exchange "link state
advertisments", until they know the other router
knows everything it knows.


Important parameters
====================
* Hello interval
  How often HELLO packets are sent.  Default 10s.
* Dead interval
  Time with no HELLOs received from a neighbour
  until considered dead.  Default 40s or 4×hello
  interval (varies between implementations).
* Authentication.
* Area id (described later).

Adjacency is only established if routers have the
same hello interval, same dead interval and same
area.


Link State Advertisments
========================
Several types of LSAs.  Types differ between
OSPFv2 and OSPFv3.

* Router
  - Neighbouring routers
  - OSPFv2: Locally connected network prefixes
* Intra Area Prefix (OSPFv3 only)
  - Locally connected network prefixes
* External
  - Routes leading out from the OSPF AS; static
    routes, routes learned from BGP, RIP, et.c
* Network Summary (v2) / Inter Area Prefix (v3)
  - Network prefixes from other areas

And several more.


Flooding
========
LSAs learned from one router, are passed on to
all other neigbouring OSPF routers.  Which in
turn pass them on to its neigbours, and so on,
until all routers have all LSAs

LSAs are stored in a non-persistent "database"
on each router, the "Link State DataBase".


Link State Advertisments (cont)
========================
LSAs are identified by an 32 bit LSA id, router-id
of advertising (originating) router, and sequence
number.

When something changes (new/lost adjacency, new
or lost local network, cost change, et.c), router
will send out new LSA with same id, but higher
sequence number.

Lifetime of an LSA is 1 hour (3600 seconds).  Must
be re-advertised (with higher seq.no) before that
by originating router.

In OSPF v2, LSA id doubles as IP adress for what
is announced in the LSA.  In OSPF v3, the LSA id
does not have any addressing semantics; announced
addresses have a separate field in the LSAs.


Shortest Path First
===================
When LSAs have been learned, routers calculate
full topology of network, and best route through
the network for each prefix (using Dijkstra's
algorithm).

"Shortest" path is path where sum of costs is
the smallest.  Costs are configured per interface,
or derived automatically from interface bandwidth.

All routers will reach the same conclusion.
*Must* do so, or routing will break.

SPF complexity scales as O(L + R * log R), where
L is number of connections between routers, and R
is number of routers.


Passive interfaces
==================
Prefixes on interfaces where you run OSPF are
automatically announced as locally connected
(in Router LSA or Intra Area Prefix LSA).

For prefixes on interfaces where you *don't*
want to run OSPF, declare the interface as
"passive" in the OSPF config.

An alternative is to *redistribute* connected
networks to OSPF, but then they will show up
as AS External LSAs.


Designated Router
=================
Optimization for many routers connected to a
single virtual 10Base5 coax cable (aka switch):

   R1  R2  R3  R4  R5  R6  R7  R8  R9  R10
    |   |   |   |   |   |   |   |   |   |
  --o---o---o---o---o---o---o---o---o---o--

One router is elected as Designated Router, one
as Backup DR.  Adjacencies formed only with DR
and BDR.

DR creates a Network LSA to represent this.

Configurable priority for DR election.  Highest
router-id wins in case of ties.  No preemption
of already elected DR and BDR.


Designated Router (cont)
=================
Not a common network design these days, as router
ports are cheap.  (But sometimes still useful.)

OSPF still defaults to do this on Ethernet.
Initial election takes dead-interval time, making
it slow to form adjacencies after link restored.

Explicitly configuring OSPF interface as
point-to-point avoids this.  Recommeded!


WARNING: Mismatching link types lead to adjacency
being established, and LSAs exchanged, but routing
not working...  Can be difficult to diagnose.
'show ospf interface XXX' shows DR elected, but
no BDR.


Scaling and Areas
=================
SPF calculation is "expensive".  Flooding of LSAs
in a densly connected network (e.g. Clos network)
can also limit scaling.

With modern routers, 100-300 routers is not a
problem, but 1000 is probably pushing the limit.

OSPF's solution: Split network into "areas".
- SPF is only calculated within an area.
- Router LSAs, Intra Area Prefix LSAs, and some
  others, are only flooded within an area.

OSPF acts as Link-State within an area, but as
Distance-Vector between areas.


Areas
=====
An area consists of a number of *networks*, with
routers connecting them.

A router with interfaces in multiple areas is an
Area Border Router (ABR).  ABR injects Network
Summary LSAs into an area based on LSAs in other
areas it connects to.

AS-external LSAs are flooded across area borders.


Area-id
=======
Areas are identified by a 32-bit number.  Area id
is usually written as a dotted quad, like an IPv4
address, but there is no semantics to it.

Area id is included in HELLO; adjacencies are not
formed if area id differs.


Backbone Area
=============
Area 0 (0.0.0.0) is special: the backbone area.

Areas *must* connect to the backbone area, and
must *not* connect to other areas, to avoid
loops.

Area 0 must not be partitioned, or things will
break.

Other areas can be partitioned with no ill
effects.  (Can even have all non-backbone areas
share the same id; as long as they don't share
an ABR they will be separate.  Stupid design to
do so, though.)


Areas (cont)
=====
Intra-area routes are selected before inter-area
routes for the same prefix (and inter-area routes
before AS-external routes).


Stub areas
==========
* Stub area
  - AS-external routes flooded into stub area are
    summarized as a default route.
  - Can't have ASBRs of its own.

* Totally stubby area
  - A stub area where inter-area routes are also
    summarized as a default route.

* Not-So-Stubby-Area (NSSA)
  - AS-external routes flooded into it are
    summarized as a default route.
  - But can have ASBRs of its own.

All interfaces in an area must be configured to
have the same area type, making it difficult to
convert.


OSPF Disadvantages
==================

* No policies or filtering; all routers are
  expected to forward any packet.
  - Although filtering between areas is possible,
    at least in some implementations.  Not easy
    to get right, though.

* Not implemented or may require license on
  smaller (1 Gbit/s) switches, e.g many ProCurve
  switches, several Juniper EX switches, Mellanox,
  Cisco SG switches.


Junos config example (interfaces)
====================

  interfaces {
     lo0 {
        unit 0 {
           family inet {
              address 198.51.100.17/32;
           }
        }
     }
     xe-0/0/7 {
        unit 0 {
           family inet {
              address 192.0.2.161/30;
           }
        }
     }
  }


Junos config example (router-id)
====================

  routing-options {
     router-id 198.51.100.17;
  }


Junos config example (OSPF v2 protocol)
====================

  protocols {
     ospf {
        area 203.0.113.0 {
           interface xe-0/0/7.0 {
              hello-interval 5;
              dead-interval 20;
              interface-type p2p;
              metric 300;
              authentication {
                 md5 1 key "SECRET";
              }
           }
           interface lo0.0 {
              passive;
           }
        }
     }
  }


Junos config example (OSPF v3 protocol)
====================

  protocols {
     ospf3 {
        area 203.0.113.0 {
           interface xe-0/0/7.0 {
              hello-interval 5;
              dead-interval 20;
              interface-type p2p;
              metric 300;
           }
           interface lo0.0 {
              passive;
           }
           interface xe-0/3/19.0;
        }
     }
  }


HP ProCurve config example
==========================

  key-chain "xyzzy"
  key-chain "xyzzy" key 1 key-string "SECRET"
  ip router-id 198.51.100.18
  router ospf
     area 203.0.113.0
     enable
  interface loopback 1
     ip address 198.51.100.18
     ip ospf 198.51.100.18 area 203.0.113.0
  vlan 3705
     ip address 192.0.2.162 255.255.255.252
     ip ospf 192.0.2.162 area 203.0.113.0
     ip ospf 192.0.2.162 cost 300
     ip ospf 192.0.2.162 hello-interval 5
     ip ospf 192.0.2.162 dead-interval 20
     ip ospf 192.0.2.162 network-type point-to-point
     ip ospf 192.0.2.162 md5-auth-key-chain "xyzzy"


Dell DNOS 9 config example
==========================

  router ospf 1
     network 192.0.2.0/24 area 203.0.113.0
     network 198.51.100.0/24 area 203.0.113.0
     passive-interface default
     no passive-interface Vlan 3706
     fast-converge 1
  interface Loopback 0
     ip address 198.51.100.19/32
     no shutdown
  interface Vlan 3706
     ip address 192.0.2.164/31
     ip ospf cost 300
     ip ospf hello-interval 1
     ip ospf dead-interval 4
     ip ospf network point-to-point
     no shutdown


Virtual Routing and Forwarding
==============================
Split your router into multiple virtual routers,
each with its own address space and routing tables
(RIB and FIB).

Separate networks that should not communicate
with each other, e.g. management network from
normal Internet.

Bind L3 interfaces to a VRF; traffic arriving
on a specific interface will be routed according
to FIB in the VRF it is bound to.

Need one OSPF (or BGP, RIP, et.c) instance in
each VRF on the router.


VRF (cont)
==========
Needs to separate the traffic on the wire as
well.

Nothing in the IP packets themselves saying
which VRF they belong to (unlike VLANs on the
Ethernet layer).

Some encapsulations:
- physically separate cables
- VLAN tagging
- MPLS tunnels
- GRE tunnels
- IPSEC tunnels


Subinterfaces
=============
Share physical port between VRFs by using VLAN
(802.1Q) tagging.

Port is still in layer 3 mode, *not* bridging.
Ethernet BUM packets (Broadcast, Unknown unicast,
Multicast) thus not forwarded to other ports.

Can use same .1Q id on several ports, without
forming a contiguous VLAN.  Can thus use same
.1Q tag for all logical links belonging to the
same VRF, not needing to allocate separate
VLAN id for every logical link.

Supported in e.g. Junos, Comware, Cisco IOS,
Cisco NX-OS.
But not in e.g. HP ProCurve, Dell DNOS, Extreme.


Subinterfaces (Junos)
=====================

  interfaces {
     lo0 {
        unit 0 { family inet { address ...; } }
        unit 9 { family inet { address ...; } }
     }
     xe-0/0/7 {
        vlan-tagging;
        unit 1 { vlan-id 1; family inet { ... } }
        unit 9 { vlan-id 9; family inet { ... } }
     }
     xe-0/0/8 {
        vlan-tagging;
        unit 1 { vlan-id 1; family inet { ... } }
        unit 9 { vlan-id 9; family inet { ... } }
     }
  }


Subinterfaces (Junos)
=====================

  routing-instances {
     LHC-OPN {
        instance-type virtual-router;
        interface lo0.9;
        interface xe-0/0/7.9;
        interface xe-0/0/8.9;
        protocols {
           ospf {
              area 203.0.113.0 {
                 interface lo0.9 { passive; }
                 interface xe-0/0/7.9 { ... }
                 interface xe-0/0/8.9 { ... }
              }
           }
        }
     }
  }


Subinterfaces (ComWare)
=======================

  ip vpn-instance lhc-opn
     description VRF for LHC-OPN.
  interface FortyGigE1/1/7
     port link-mode route
  interface FortyGigE1/1/7.1
     ip address ...
     ospf ...
     ...
  interface FortyGigE1/1/7.9
     ip binding vpn-instance lhc-opn
     ip address ...
     ospf ...
     ...


Unnumbered links
================

* IPv4: "Borrow" address from another interface
  (typically loopback) as a /32.
* IPv6: Use only link-local address on interface.

Let routing protocol discover the other end, and
automatically set up host route for it.

No need for allocating /31 or /127 for every
routing link, adding to DNS, and configure on
the routers.
Downlinks from all spine routers identical.
All uplinks on each leaf router identical.


Unnumbered links
================

  interfaces {
     xe-0/0/7 {
        unit 0 {
           family inet {
              unnumbered-address lo0.0;
           }
           family inet6;
        }
     }
  }

Not everyone supports this, but at least Junos,
Cisco IOS, Cisco NXOS, HPE Comware, Cumulus,
Arista EOS.  "Some restrictions may apply."

Not supported in: HP ProCurve, Dell DNOS, Extreme.


RIB vs FIB
==========
Many network operating systems differentiate
between Routing Information Base and Forwarding
Information Base.

RIB is used as communication channel between
routing protocols in router.  Each protocol
calculate routes, and insert the best into RIB.
Same prefix can exist several times in RIB, from
multiple sources.  Lots of metadata.

FIB is the routing table that is actually used
for forwarding packets (e.g. programmed into
hardware).  For every prefix in RIB, one route
is selected as the best and copied to FIB.


Redistributing from/to other protocols
======================================

When a router receive routes from some other
protocol (e.g. BGP, or static routes), you may
need to *redistribute* those into OSPF, so other
routers in OSPF-AS know how to reach those.

Common use-case: default route from ISP via BGP.

Alternatively, you can redistribute routes from
OSPF to e.g. BGP.

Be careful!  Limit which routes gets redistri-
buted.  Rare to redistribute *all* routes from
one protocol to another.


Reading more
============

"OSPF and IS-IS -- Choosing an IGP for Large-Scale
 Networks"
Jeff Doyle
Addison-Wesley, ISBN 978-0-321-16879-4