WHITE PAPERS
from the files of
Networking Unlimited, Inc.

WebQuery@NetworkingUnlimited.com
14 Dogwood Lane, Tenafly, NJ 07670
Phone: +1 201 568-7810

Using BGP to Trigger Multiple Levels of Dial Backup on Cisco Routers

Copyright © 1999, Networking Unlimited, Inc. All Rights Reserved.

Cisco does not directly support dial backup of dial backup links, such as using ISDN to backup a frame relay link and then using analog modems to backup the ISDN. This paper shows how BGP can be used to detect backup failure and automatically force the modem link to be established only in event that specific alternative routes are not available. Theoretically, this approach could be used to provide any number of levels of dial backup.

Background

One critical requirement for successful use of dial on demand routing (DDR) for backup purposes is having "interesting" traffic present at the router with the backup line to force the dial. When there is only one dial link, it is usually easy to configure the routers so that any normal traffic, which frequently includes the syslog reporting that the primary link has gone down, will force the link up. But if we want to provide a second dial backup just for insurance, it is no longer automatic that there will be appropriate traffic to trigger the dialing.

The easy solution is to use IOS 12.0 and dialer-watch. Simply define two unique loopback port addresses at the target site and define filters to advertise them both over the primary link, one of them over the preferred dial link, and neither of them over the secondary dial link. Then set up a dialer-watch on the preferred dial link for the address only advertised over the primary link and a dialer-watch on the secondary dial link for the address advertised over both the primary and the preferred dial links. This is a trivial extension to the normal use of dialer-watch.

On the other hand, there are many networks where the use of IOS 12.0 is not an option. Whether that is because of an unwillingness to put an LD IOS into production use or simply because the routers already in the field do not have the flash and RAM resources required is immaterial. Even though not in a position to use dialer-watch, there can still be a need to use two levels of backup. This can be done on any IOS that supports dial on demand routing with a little extra effort.

One way is to set up a pair of loopback addresses on the target router(s) with restricted advertising the same as when using dialer-watch to do the job. But instead of dialer-watch, use a host system at the remote site to periodically ping the two uniquely defined loopback addresses. Appropriately defined floating static routes could then be used to route the pings to provide the required triggers for standard DDR. Should the primary link go down, both dial links would attempt to restore the connectivity. Once the preferred dial link came up, the pings for the secondary backup link would travel over the preferred link and the secondary dial link would drop. Alternatively both links could be configured to stay up for additional bandwidth availability. The problem with this approach, of course, is that the router network is now dependent upon additional hardware and software to keep the network functional, not to mention the management challenges if the hosts and the routers happen to be controlled by different management chains.

Of course, what we really want is for the routers themselves to generate the required traffic. While it is possible to telnet into a router and have it "ping" any destination using virtually network protocol supported by the router, there is no provision in any Cisco IOS to automatically do so as part of the router configuration. However, we can get the same effect by using the TCP keep alive packets between BGP neighbors.

BGP Driven DDR

The configuration example which follows is adapted from a production configuration in use by a client since 1998. There are two routers at the site to provide router redundancy. The router Primary has the primary frame relay link being protected. The router Secondary has the preferred ISDN link to be used for backup. The second level of backup is an asynchronous modem on the auxiliary port of router Primary. Both IP and IPX are supported on all links.

While in the production configuration, each of these links is served by a separate router at the target location, for this example the required configuration is shown as all on one router and only the configuration for the one remote site is included. (Obviously, we would not normally use an ISDN PRI line to back up a single ISDN BRI site.) Conversely, we show the asynchronous modem support on the auxiliary port of the Target router. This is only suitable if the probability of using the "emergency backup" is low, as it provides only a single backup link for multiple potential users. In the network where this approach was originally applied, we started out using the auxiliary port, but found the router-to-router asynchronous support capability so useful for other purposes that a dedicated 2511 access server was installed.

The use of IPXWAN for DDR of IPX is not required for this application. It was used in the original design for other reasons. Similarly, the IP links could be numbered or unnumbered. In this application dedicated links were assigned IP addresses so they could be monitored by NetView/6000 while DDR links were configured as unnumbered so they would not confuse NetView/6000 monitoring.

So lets look at the actual IOS 11.2 configurations required. Note that lines in the examples ending with "\" include the next line as part of the current line and should be united (without the backslash) if copied. All phone numbers, user names, passwords, and IP and IPX addresses have been changed to protect the client.

BGP Driven DDR--Primary Router

hostname Primary105
!
username Target password yzzyx
!
ipx routing 0000.0c01.0105
ipx internal-network A10105
chat-script courier ABORT ERROR ABORT "NO " ABORT BUSY "" "at" "" "at&f" OK \
  "atl1m1&b1&h1&r2&c1&d3&m4&k1s0=2" OK "at dt \T" TIMEOUT 90 CONNECT \c
chat-script target-login TIMEOUT 10 Username-\r-Username SiteBackup "Password:" \
  Insecure Target "ppp 10.208.100.25"
!
interface Loopback0
 ip address 10.201.5.1 255.255.255.255
!
interface Ethernet0
 ip address 10.101.5.1 255.255.255.0
 ip access-group 104 in
 ipx network A00105 encapsulation SAP
!
interface Serial0
 no ip address
 encapsulation frame-relay
 priority-group 1
!
interface Serial0.17 point-to-point
 ip address 10.201.5.6 255.255.255.252
 bandwidth 56
 ipx delay 7
 ipx network B00105
 frame-relay interface-dlci 17
!
interface Async1
 ip address 10.2.1.5 255.255.0.0
 encapsulation ppp
 bandwidth 10
 async default routing
 async mode interactive
 ipx ipxwan 0 unnumbered Primary105
 ipx delay 30
 dialer in-band
 dialer idle-timeout 300
 dialer map ipx 0.0000.1234.0000 name Target broadcast 18885551212
 dialer map ip 10.2.255.254 name Target modem-script courier system-script \
  target-login broadcast 18885551212
 dialer hold-queue 10 timeout 60
 dialer-group 1
 ppp authentication chap
!
router eigrp 1
 network 10.0.0.0
!
! Use frame relay if available
ip route 0.0.0.0 0.0.0.0 10.0.0.1 150
! Otherwise use ISDN if available
ip route 0.0.0.0 0.0.0.0 10.0.0.3 155
! If all else fails, try Asynchronous dial
ip route 0.0.0.0 0.0.0.0 10.2.255.254 160
! Support ISDN Test Address
ip route 10.0.0.2 255.255.255.255 10.101.5.2
! Asynchronous dial address
ip route 10.2.0.0 255.255.0.0 Async1
access-list 102 deny eigrp any any
access-list 102 permit ip any any
access-list 902 deny rip
access-list 902 deny sap
access-list 902 permit any
priority-list 1 protocol ipx high
priority-list 1 protocol ip low
!
dialer-list 1 protocol ip list 102
dialer-list 1 protocol ipx list 902
!
line aux 0
 password mumble
 script dialer courier
 login
 modem InOut
 transport input all
 rxspeed 38400
 txspeed 38400
 flowcontrol hardware
!
end

Each pair of routers can have a unique shared secret. Here, Target and Primary105 use the shared password yzzyx while Target and Backup105 use the password xyzzy. The "ipx routing" command turns on IPX so that Novell Netware can be supported on the network. The IPX internal network assignment is critical, as it becomes part of the IPX host address when identifying this router using IPXWAN. The priority-group command on the frame relay link acknowledges the high sensitivity of NetWare to routing delays. Depending upon the actual mix of applications being supported, fancier queuing strategies would probably be more appropriate and we do not even try to illustrate here any optimizations for the emergency asynchronous link. Other commands useful when supporting IPX, such as all the filters appropriate for keeping the link from coming up unnecessarily, suppressing advertisement of unnecessary addresses and services, and spacing out RIP and SAP updates over slow WAN links are not shown.

This example should be considered the minimum to get ISDN and asynchronous dial support of IP and IPX, not a fully optimized configuration ready for implementation. Nor is it adequate for an IPX only environment. For example, if there is no outbound IP traffic present at the Primary router when the frame relay link dies, it will never attempt to bring up the asynchronous link. Indeed, as you can see from the dialer maps, IPX traffic is incapable of bringing up the dial link on this router as only the IP dialer map has the necessary chat scripts specified. Also be aware if using IPX RIP for Novell routing that even though a dial link comes up quickly, IPX will not "see it" until RIP ages out the old route, which can take several minutes.

The weighting of the floating static routes is critical to successful operation. The static routes must have a higher distance specified than that used by the interior routing protocol. The other key is the filtering on the Target router of locally defined loopback addresses so that only certain loopback targets are advertised over any specific link. If multiple Target routers are used, all that is required is to put the appropriate loopbacks on the appropriate targets. There is no problem with defining the same target addresses on multiple routers, as long as they are filtered out of any direct advertisements such as over a common Ethernet (hence the advertisement filter on the Ethernet port in the example Target router).

Under normal conditions, all traffic flows via frame relay. In a hub and spokes configuration, filtering the hub so that only the target loopbacks escape can greatly reduce the routing traffic overhead to the spokes and minimize the routing table size each spoke must maintain. When the frame relay link goes down, the 10.0.0.1 target is no longer reachable and the router looks for an alternative default route. Since the target address 10.0.0.3 is only advertised over ISDN, it will not be available and the only useable target is 10.2.255.254 via the asynchronous link. Under the rules of dial on demand routing, dial links are always connected (whether physically connected or not) so the floating static route through the asynchronous port is activated and the first non-local packet to arrive initiates a call.

If you think about it, this is just standard single router dial backup as already discussed in Chapter 3: "Dial Backup for Permanent Links." The only difference is the extra floating static route for when ISDN comes to life, and even that is not required if the network is small enough that all destinations are advertised over the ISDN dial link. The fancy configuration work is all done on the Backup router, which can no longer depend upon the Primary router to send it traffic to bring up the backup link. So lets take a look at the Backup router.

BGP Driven DDR--Backup Router

hostname Backup105
!
enable password 7 13041E000D000B3D
!
username Target password xyzzy
ipx routing 0000.0c02.0105
ipx internal-network A20105
isdn switch-type basic-ni1
!
interface Loopback0
 ip address 10.201.5.2 255.255.255.255
!
interface Ethernet0
 ip address 10.101.5.2 255.255.255.0
 ipx network A00105 encapsulation SAP
!
interface BRI0
 ip unnumbered Loopback0
 encapsulation ppp
 bandwidth 20
 no keepalive
 ipx ipxwan 0 unnumbered Backup105
 ipx delay 30
 isdn spid1 80055512120101
 isdn spid2 80055512130101
 dialer idle-timeout 170
 dialer map ipx 0.0000.1234.0000 name Target speed 56 broadcast 18775551234
 dialer map ip 10.1.255.254 name Target speed 56 broadcast 18775551234
 dialer hold-queue 10
 dialer-group 1
 ppp authentication chap
!
router eigrp 1
 network 10.0.0.0
!
router bgp 99
 no synchronization
 network 10.201.5.2 mask 255.255.255.255
 timers bgp 10 120
 neighbor 10.0.0.10 remote-as 25
 neighbor 10.0.0.10 ebgp-multihop 255
 neighbor 10.0.0.10 update-source Loopback0
!
ip route 0.0.0.0 0.0.0.0 10.0.0.1 150
ip route 0.0.0.0 0.0.0.0 10.0.0.3 154
ip route 0.0.0.0 0.0.0.0 10.0.0.4 157
ip route 0.0.0.0 0.0.0.0 10.0.0.2 160
ip route 10.0.0.2 255.255.255.255 BRI0
ip route 10.0.0.10 255.255.255.255 10.0.0.1
ip route 10.0.0.10 255.255.255.255 10.0.0.2 190
ip route 10.2.255.254 255.255.255.255 10.101.5.1
access-list 102 deny eigrp any any
access-list 102 permit ip any any
access-list 902 deny rip
access-list 902 deny sap
access-list 902 permit any
priority-list 1 protocol ipx high
priority-list 1 protocol ip low
!
dialer-list 1 protocol ip list 102
dialer-list 1 protocol ipx list 902
!
scheduler interval 500
end

The backup router is standard IP & IPX DDR except for the extra floating static routes and BGP configuration. Lets look at the floating static routes first. Normally, all traffic is routed via frame relay, based on advertisements received from the primary router or the default route to the address 10.0.0.1 which can only be learned from the frame relay connection. If the frame relay link fails, these routes will age out (unless poison reverse kills them first), and the router will look for an alternative route to external destinations. If the ISDN link is up and functional, the default route to address 10.0.0.3 takes over and all traffic is carried over the ISDN link until such time as the frame relay link returns to service and the preferred default route to 10.0.0.1 can be used again.

If the ISDN link is not up (which would be the normal case in the initial phase of a frame relay failure), the default route would drop through to the default route to 10.0.0.4, the route through the asynchronous port on the primary router. Since ISDN calls normally complete in a few seconds compared to the 30 to 60 seconds required for an analog modem to connect, this path will probably not be available initially, and the default route would drop to the ISDN BRI reachable 10.0.0.2, bringing up the ISDN link.

Since we can see that the ISDN link would normally come up after a frame relay failure, why do we need to burden the routers with BGP? The answer, like most of the answers to network protocol questions, is to cover the situation where things go wrong. Most likely is the situation where there is no outbound traffic to force the link up at the backup router, while the primary router immediately starts dialing the analog link to send the syslog event reporting the frame relay link down to the syslog server at the network management center. This is particularly likely if the primary router is configured as the default gateway for all users to maximize efficiency under normal (frame relay up) conditions. Yes, we could use Hot Standby Router Protocol to move the active router based on what links are up (and that is a good idea, although we don't show it in this example), but even that will not guarantee that traffic will be present at the backup router to force up the link if all users just happen to be idle. Remember the first rule of protocol design--if a failure mode is possible, the question is not "Will it occur," but rather "When it occurs, can it be handled." That is what we are using BGP for in this example.

Note that BGP will only establish a connection to a destination which is explicitly in the routing tables, and the peer address, 10.0.0.10 is not advertised over any links from the target router. Therefore, the only way we can establish a BGP peering is via static routes, of which we define only two. The preferred route is via the frame-relay target address, so that when frame-relay is up, BGP will exchange keep alive packets using the primary link. However, whenever the primary route disappears, we immediately fall back to the floating static route through the "always available" target address used for ISDN (that is why we need two ISDN target addresses, one to be used only when ISDN is up, the other to use at any time to bring ISDN up). As a result, even if we are currently carrying all production traffic through the asynchronous link on the primary router, the secondary router will continuously try to bring up the ISDN link due solely to BGP's exchange of keep alive messages. While some people may consider it a benefit and others may consider it a defect, be aware that once the link is declared down by BGP, the ISDN retries will be based on attempts to reestablish a TCP connection, not the routine exchange of keep alive packets, and retries will only occur periodically rather than continuously.

One other convenience feature which is easy to overlook is the definition of a static route to the asynchronous activation address. This allows testing of the asynchronous port by pinging the activation address from any system on the LAN rather than requiring logging into the primary router. We include a similar static route for the ISDN activation address on the primary router for exactly the same reason (simplifying ISDN testing).

BGP Driven DDR--Target Router

hostname Target
!
username Primary105 password yzzyx
username Backup105 password xyzzy
! Define other locations and passwords here...
ipx routing 0000.0C00.F001
ipx internal-network 1234
isdn switch-type primary-dms100
chat-script courier ABORT ERROR ABORT "NO " ABORT BUSY "" "at" "" "at&f" OK \ "atl1m1&b1&h1&r2&c1&d3&m4&k1s0=2" OK "at dt \T" TIMEOUT 90 CONNECT \c
!
controller T1 0
 framing esf
 linecode b8zs
 pri-group timeslots 1-24
!
interface Loopback0
 description Unique Address for Network Management
 ip address 192.168.0.1 255.255.255.255
!
interface Loopback1
 description Primary target for frame relay users
 ip address 10.0.0.1 255.255.255.255
!
interface Loopback2
 description IP Address for ISDN Unnumbered
 ip address 10.0.0.2 255.255.255.255
!
interface Loopback3
 description Routing target for ISDN users
 ip address 10.0.0.3 255.255.255.255
!
interface Loopback4
 description Routing target for ISDN users
 ip address 10.0.0.4 255.255.255.255
!
interface Loopback10
 description Routing target for BGP keepalives
 ip address 10.0.0.10 255.255.255.255
!
interface Ethernet0
 ip address 10.1.255.254 255.255.0.0
 ipx network AA0000 encapsulation SAP
!
interface Serial0
 description Frame Relay (T1)
 no ip address
 encapsulation frame-relay
 priority-group 1
!
! Subinterfaces to support other sites go here
!
interface Serial0.105 point-to-point
 description Site 105 Example
 ip address 10.201.5.5 255.255.255.252
 bandwidth 56
 ipx delay 7
 ipx network B00105
 frame-relay interface-dlci 105
!
interface Serial0:23
 ip unnumbered Loopback2
 encapsulation ppp
 bandwidth 20
 no keepalive
 ipx ipxwan 0 unnumbered Target
 ipx delay 20
 dialer idle-timeout 300
 dialer map ipx 0.00A2.0105.0000 name Backup105 speed 56 broadcast
 dialer map ip 10.201.5.2 name Backup105 speed 56 broadcast
! Dialer maps to support other sites go here
 dialer-group 1
 ppp authentication chap
!
interface Async1
 ip address 10.2.255.254 255.255.0.0
 encapsulation ppp
 bandwidth 10
 ipx ipxwan 0 unnumbered Target
 ipx delay 30
 async default routing
 async mode interactive
 peer default ip address 10.2.99.99
 dialer in-band
 dialer idle-timeout 300
 dialer map ipx 0.00A2.0105.0000 name Primary105 speed 56 broadcast
 dialer map ip 10.2.1.5 name Primary105 broadcast
! Dialer maps to support other sites go here
 dialer-group 1
 ppp authentication chap
 pulse-time 3
!
router eigrp 1
 network 10.0.0.0
 distribute-list 10 out Async1
 distribute-list 11 out Ethernet0
 distribute-list 12 out Serial0.25
 distribute-list 13 out Serial0:23
 no auto-summary
!
router bgp 99
 no synchronization
 network 10.0.0.10 mask 255.255.255.255
 neighbor 10.201.5.2 remote-as 25
 neighbor 10.201.5.2 ebgp-multihop 255
 neighbor 10.201.5.2 update-source Loopback10
!
ip classless
access-list 10 permit 10.0.0.4 0.0.0.0
access-list 10 deny 10.0.0.0 0.0.0.255
access-list 10 permit any
access-list 11 deny 10.0.0.0 0.0.0.255
access-list 11 permit any
access-list 12 permit 10.0.0.1 0.0.0.0
access-list 12 deny 10.0.0.0 0.0.0.255
access-list 12 permit any
access-list 13 permit 10.0.0.2 0.0.0.0
access-list 13 permit 10.0.0.3 0.0.0.0
access-list 13 deny 10.0.0.0 0.0.0.255
access-list 13 permit any
priority-list 1 protocol ipx high
priority-list 1 protocol ip low
!
dialer-list 1 protocol ipx permit
dialer-list 1 protocol ip permit
!
line aux 0
 script startup courier
 script reset courier
 login local
 modem InOut
 transport input all
 stopbits 1
 rxspeed 38400
 txspeed 38400
 flowcontrol hardware
!
end

Before we discuss the individual functions configured on the target router, keep in mind that normally these functions would be distributed across two or more routers to eliminate the target router as a single point of failure. Since targets will often be serving more than one remote, it is quite possible that all functions will be on any single target router, but their use would be distributed across multiple remotes. It is also possible to make multiple target routers look and behave like a single router for dial backup purposes, further enhancing redundancy without adding significant complexity to the remote router configurations.

Note the advertisement filter on the local Ethernet port in addition to those on the links to the remotes. This filter allows us to reuse the same loopback addresses on all routers in a target cluster without them confusing each other. Just use care when setting up this filter to not block a loopback address used to identify the real router to network management.

The advertisement filters on the access lines are set up to allow all "real" addresses to be advertised. This is appropriate for a mesh configuration, but there are better ways to handle large hub and spoke networks where exchanging full routing tables will consume unnecessary bandwidth. The goal should be to advertise as few addresses as possible while maintaining maximum capability to alternate route. Along the same lines, if using BGP for routing as well as for dial backup support, keep the two uses of BGP in different autonomous systems so that the useless routes learned as a side effect of the dial backup support do not escape and pollute the routing tables used for routing production traffic. While it is theoretically possible to combine the two functions, the potential for introducing mistakes is just too high and there are far more productive investments to be made of the design expertise and time required.

Since the asynchronous dial backup is for emergency use only, it is handled in this configuration by the auxiliary port on the router rather than an asynchronous dial interface. This limits the maximum data rate between router and modem to 38400, which will impact the effectiveness of modem based compression and eliminate any real benefit to using an ISDN modem to provide 56K dial access support.

Expanding to arbitrary connectivity

The approach described in the preceding example to provide two dial backup links for one permanent link across two routers can be easily expanded to provide any number of dial backup links behind any number of permanent links on any number of routers. This can include sharing of the same dial backup links across multiple permanent lines, which is impossible when using the backup interface command approach.

Since the capability is standard Dial on Demand Routing (DDR), dial backup is not limited to asynchronous or ISDN channel speeds. Depending upon the release of IOS in use, multilink PPP can be used to provide arbitrary pipe sizes in the dial backup scheme. This may require the use of multi-chassis multilink PPP at the target locations, depending upon the degree of clustering required.

The key to success is to map out what connectivity is available and what backups should be active under what conditions. Once the priorities are identified, the appropriate loopback addresses can be defined at the required target locations, and the floating static routes required sorted by distance. In addition to checking that operation will be what you want under the conditions you are trying to protect against, you should go further to look for any ways that you can cause an undesirable set of connections. Then fix your design by including the "bad scenario" in the set of protected conditions. Remember that KISS also applies to network design. The simpler the design, the easier it is to prove that it will work. conversely, the more complex the design, the harder it is to detect or identify conditions under which it will fail.


Home Page | Company Profile | Capabilities | Coming Events | Case Studies | White Papers | Book

Copyright 1999-2000 © Networking Unlimited Inc. All rights reserved.