White Paper: Redundant DLSw for Ethernet Attached Devices

WHITE PAPERS
from the files of
Networking Unlimited, Inc.

WebQuery@NetworkingUnlimited.com
14 Dogwood Lane, Tenafly, NJ 07670
Phone: +1 201 568-7810

Cisco 11.2 IOS Configuration for Redundant DLSw Connecting Ethernet Attached Devices

Data Link Switching (DLSw) can provide excellent connectivity for IBM SNA applications. However, when the SNA devices are Ethernet rather than token ring attached so that source routing is no longer end-to-end, configuring redundant DLSw peers will result in an unstable network. While Cisco has introduced redundant Ethernet capability in IOS 12.0, this paper presents a DLSw peering configuration that will work with any IOS release starting with 11.2 to provide hot-standby capability and eliminate single points of failure, without introducing the switch compatibility challenges and subsequent manual configuration needs of the Cisco capability.

Background

Data Link Switching (DLSw) is a popular technique for providing connectivity between various IBM SNA and NetBIOS products. These protocols are designed to communicate over an extended LAN which creates major performance problems in a routed network. By terminating the LAN sessions at the router and using TCP/IP or other WAN oriented protocols to actually carry the data content of the IBM protocols, DLSw not only eliminates the traffic overhead of supporting the chatty broadcast and keepalive traffic associated with the LAN protocols, but also allows transparent utilization of redundant paths through the WAN, simplifying provision of robust link fallback strategies.

The challenge arises when reliability requirements demand providing redundant routers on each LAN. For example, consider Figure 1 where we have routers A and B at a remote site communicating via redundant links to routers X and Y at the data center. TCP/IP and IPX have no problems taking full advantage of both available links, and should a link or a router fail, the remaining link and its routers will automatically assume the full traffic burden. Unless the remaining link has insufficient bandwidth available, the users would never even realize that a link or a router was down.

Figure 1: Example redundant router architecture

When extending such a redundant network to also support DLSw, such as to support remote 5250 terminal access to a central AS/400 minicomputer, the natural response is to duplicate the router redundancy and provide DLSw peerings between the routers at each end of each WAN link so that router A peers with router X and router B peers with router Y. This is the example used in most of the Cisco documentation to show how to implement high reliability DLSw.

Token Ring versus Ethernet

The sharp reader will notice that in all the Cisco examples of redundant DLSw, the LAN is always a token ring LAN. This is neither accidental, nor a reflection that the traditional preferred LAN in IBM shops is token ring. One fundamental operational difference between Ethernet and token ring is the support of source route bridging in token ring attached devices. Source routing allows extended token ring LANs to support multiple bridged paths connecting users on the network. DLSw takes advantage of this capability of token ring and rather than emulating a locally attached endpoint on the LAN, the DLSw endpoint pretends it is behind a bridge.

The result is that an SNA network using DLSw is topologically equivalent to a bridged network where each DLSw peer relationship is replaced by a remote bridge. In an extended token ring network, there is no problem with having parallel bridged paths between two token rings. Indeed, it is recommended, as it improves reliability and allows for sharing of the links.

Ethernet end systems, on the other hand, do not support source route bridging. They only support transparent bridging, which provides only a single possible path between source and destination. (Transparent spanning tree bridges support redundant links and bridges by automatically disabling duplicate paths until the bridged LAN topology is a minimal spanning tree.) Consequently, if redundant DLSw peers are configured to an Ethernet LAN, the end systems on the Ethernet will see the remote end system as two independent systems with the same MAC address rather than as a single end system accessible via two different paths.

Unlike true "dumb" bridges, however, redundant DLSw peers connected to Ethernet LANs, whether or not the other end of the peerings is Ethernet or token ring, will not immediately flood the network with looping frames. Nor will the DLSw peers automatically prune the connectivity down to a single path configuration as will occur with transparent spanning tree bridges. Indeed, as many novices have discovered to their undoing, redundant DLSw peers between Ethernet LANs will actually work... but only for a while. The configuration is unstable, and the SNA services will gradually degrade until finally, all SNA communications grind to a halt. Since the degradation increases with increasing load, this frequently winds up as a failure during peak usage periods which can be harmful to the career of the designer.

Backup DLSw Peers

Starting with IOS 11.2, the DLSw backup peer command can be used to provide redundancy for Ethernet attached devices. (Note that the backup peer command is also in earlier IOS releases, it is just not useful for this application until 11.2.) The backup peer command allows the designer to configure DLSw peers which are only activated when another peer cannot be reached.

For example, assume the router LAN interfaces have addresses A = 10.1.0.1, B = 10.1.0.2, X = 10.99.0.1, and Y = 10.99.0.2. A redundant configuration for token ring LANs would then look like:

! Remote A (token ring)
!
source-bridge ring-group 10
dlsw local-peer peer-id 10.1.0.1
dlsw remote-peer 0 tcp 10.99.0.1
. . .
interface tokenring 0
ip address 10.1.0.1 255.255.0.0
ring-speed 16
source-bridge active 25 1 10

! Remote B (token ring)
!
source-bridge ring-group 10
dlsw local-peer peer-id 10.1.0.2
dlsw remote-peer 0 tcp 10.99.0.2
. . .
interface tokenring 0
ip address 10.1.0.2 255.255.0.0
ring-speed 16
source-bridge active 25 2 10

! Remote X (token ring)
!
source-bridge ring-group 12
dlsw local-peer peer-id 10.99.0.1
dlsw remote-peer 0 tcp 10.1.0.1
. . .
interface tokenring 0
ip address 10.99.0.1 255.255.0.0
ring-speed 16
source-bridge active 5 1 12

! Remote Y (token ring)
!
source-bridge ring-group 12
dlsw local-peer peer-id 10.99.0.2
dlsw remote-peer 0 tcp 10.1.0.2
. . .
interface tokenring 0
ip address 10.99.0.2 255.255.0.0
ring-speed 16
source-bridge active 5 2 12

Example 1: Redundant DLSw with Token Ring LANs

! Remote A (Ethernet)
!
source-bridge ring-group 10
dlsw local-peer peer-id 10.1.0.1
dlsw remote-peer 0 tcp 10.99.0.1 lf 1500
dlsw remote-peer 0 tcp 10.99.0.2 lf 1500 backup-peer 10.0.0.1 linger 0
dlsw bridge-group 5
. . .
interface ethernet 0
ip address 10.1.0.1 255.255.0.0
bridge-group 5
. . .
bridge 5 protocol ieee

! Remote B (Ethernet)
!
source-bridge ring-group 10
dlsw local-peer peer-id 10.1.0.2 passive
dlsw remote-peer 0 tcp 10.99.0.2 lf 1500
dlsw bridge-group 5
. . .
interface ethernet 0
ip address 10.1.0.2 255.255.0.0
bridge-group 5
. . .
bridge 5 protocol ieee

! Remote X (Ethernet)
!
source-bridge ring-group 12
dlsw local-peer peer-id 10.99.0.1
dlsw remote-peer 0 tcp 10.1.0.1 lf 1500
dlsw remote-peer 0 tcp 10.1.0.2 lf 1500 backup-peer 10.0.0.1 linger 0
dlsw bridge-group 5
. . .
interface ethernet 0
ip address 10.99.0.1 255.255.0.0
bridge-group 5
. . .
bridge 5 protocol ieee

! Remote Y (Ethernet)
!
source-bridge ring-group 12
dlsw local-peer peer-id 10.99.0.2 passive
dlsw remote-peer 0 tcp 10.1.0.2 lf 1500
dlsw bridge-group 5
. . .
interface ethernet 0
ip address 10.99.0.2 255.255.0.0
bridge-group 5
. . .
bridge 5 protocol ieee

Example 2: Redundant DLSw with Ethernet LANs

A few comments on this implementation are in order. The DLSw peers are configured to match the LAN interfaces rather than a loopback address. This is optional for token ring, but strongly recommended for the Ethernet version unless there is more than one LAN interface configured to support DLSw. If the Ethernet interface goes down, you want the DLSw peering to fail, otherwise the remote end will not bring up its backup peer to replace it.

The specification of a maximum frame size of 1500 octets may not be required but will not hurt even if not needed. It forces the remote end to negotiate a maximum frame size suitable for Ethernet, which is not required if the remote end is on Ethernet, but will prevent problems if the remote end is converted to token ring or bridged to a token ring running end systems which do not negotiate maximum frame size correctly.

The bridge group specifications are locally significant only. They do not need to be consistent between routers, but must be consistent on any single router. The same is true of the ring group numbers, except that the ring number and device number in the token ring interface specification of ring bridging must be consistent across the extended LAN as they are used end-to-end by the end systems as part of source routing.

The linger parameter should be set to zero to ensure that there is never more than one active path between the two LANs. This has the undesirable side effect of killing any open sessions when the primary peering returns to life, but there is no way around the problem as many controllers will keep a session alive forever if they can. It may also require network management to disable DLSw on the primary peer if an unstable link or interface results in flapping between primary and backup DLSw peerings. Fortunately, the "dlsw disable" command makes it easy to turn DLSw on and off without deleting the existing configuration.

The passive parameter on the local peer definitions of the backup routers prevent them from continuously trying to bring up the peering when it is not needed. Unfortunately, in 11.2 the passive applies to all peers defined on the router and can not be applied to only specific remote peers. This does not affect this simple illustration, but does make it more difficult to load balance DLSw peerings when a central site is supporting multiple remote sites.

How It Works

Token ring redundancy works by taking advantage of source routing on the end systems. If an end system finds that its route (which will take it to one DLSw peer or the other on the local LAN) is no longer valid, it will automatically switch to the other route. Speed of response to loss of a DLSw peer depends upon the sophistication of the source route bridging algorithm on the end systems, as does the degree of load sharing, if any.

Ethernet redundancy works by enabling an alternate peer relationship should the peer in operation fail. Speed of response is typically much slower, as it normally depends upon the transport level timeout of the TCP connection between peers and attempts to reestablish the primary TCP connection before switching over to the backup peering. Even more important, active SNA connections will be broken and must reestablish themselves. In the example illustrated in Figure 1, the terminals will lock and log out while the DLSw peering switches over to the backup. This is not transparent to the users, but the recovery is automatic other than the need for users to log back in.

How the redundancy works can be seen by considering what happens when any particular link or router fails.

If the link between router A and router X fails, the TCP/IP traffic supporting the DLSw peering active between A and X goes across the LAN at each end to routers B and Y to use the good link remaining between the two sites.

If the link between router B and router Y fails, there is no impact on DLSw traffic, unless routing weights are configured to shift traffic to the B to Y link, in which case the primary DLSw peering should be between B and Y rather than between A and X. If both links are required to support high traffic levels, DLSw is probably not going to work well as it tends to be sensitive to excessive delays.

If router A fails, the DLSw peer on router X times out and router X establishes its backup peering with router B routing the TCP/IP packets required to support the DLSw peer relationship through router Y and across the B to Y WAN link. SNA sessions will stall, then fail once DLSw recognizes the loss of the remote peer. Typically, by the time the stall is recognized, the backup peering is up and any attempt to restore SNA connectivity, whether automatic or manual, will succeed.

If router B fails, nobody cares. Except, of course, network management needs to identify the failure so that repair efforts can be dispatched to get the backup equipment back on line before it is needed. The same applies to router Y.

If router X fails, the DLSw peer on router A times out and router A establishes its backup peering with router Y, analogous to the way that router A recovers from failure of router X.