Khaled Elmeleegy, Enhancing Ethernet's Reliability

Ethernet is pervasive. This is due in part to its ease of use and its low cost-to-performance ratio. Moreover, Ethernet is self healing in the event of equipment failure or removal. Unfortunately, its routing protocol -- the Rapid Spanning Tree Protocol (RSTP) -- is known to suffer from a classic count-to-infinity problem. However, the causes and implications of this problem are neither documented nor understood. In this talk, I will identify the exact conditions under which the count-to-infinity problem manifests itself. Then, I will show how it slows down the reconfiguration of Ethternet's spanning tree. Other than count-to-infinity, Ethernet is also known to suffer from another type of problem, namely forwarding loops. A forwarding loop can cause packet loss and duplication, which in some cases may persist indefinitely.

To address these problems in Ethernet, I will present two solutions. First, I will introduce the EtherFuse, a new device that can be inserted into an existing Ethernet to speed the reconfiguration of the spanning tree. It can also detect then break forwarding loops in Ethernet networks. EtherFuse is backward compatible and requires no change to the existing hardware, software, or protocols. I will describe a prototype EtherFuse implementation and experimentally demonstrate its effectiveness. Second, I will present a simple yet effective modification to RSTP called RSTP with Epochs. This solution guarantees that the spanning tree converges in at most one round-trip time across the network and eliminates the possibility of a count-to-infinity induced forwarding loop while maintaining backward compatibility with RSTP.