In our recentposts about Hurricane Sandy, we analyzed the impacts of the super storm on Internet connectivity in the northeastern US.
However, in addition to knocking out power and Internet connectivity in a significant part of the New York metropolitan area, Sandy also had a surprising impact on the world’s Internet traffic, traffic that neither originated from nor was destined to areas affected by the storm.
From locations around the globe as varied as Chile, Sweden and India,
some Internet traffic was forced onto alternate paths to avoid failures at critical transit points in the NYC area.
We’ll take a look at some examples in what follows.
Survivability And Resilience of the Internet
The Internet was originally designed for survivability and resilience in the event of the loss of a critical node,
utilizing such technologies as packet switching and dynamic routing.
While we certainly cannot avoid outages at the edges of the Internet
(e.g., individual data centers), the Internet as a whole is a highly survivable system.
Our earlier blogs on Sandy focused on outages at such Internet endpoints,
such as those hosting webservers, residential networks or smartphone users.
Here we look at what happens when a piece of the Internet’s core disappears and routers and their operators must react in order to keep traffic flowing.
Examples of Survivability and Resilience
In our initial blog about the impacts of Hurricane Sandy, we stated that:
As a result of outages [in New York City], we’ve observed Internet traffic shift away from the city as carriers scramble for alternative paths.
We provide a few clear examples of this phenomenon below. In each of the examples, a large provider shifted traffic away from its New York City infrastructure as the storm was battering the city. Also, in each example, Internet traffic continued to flow through the affected provider.
Thus, these fail-overs occur internal to each provider’s network and hence, are not visible in global routing data.
But we are able to observe these changes using active measurements in the form of traceroutes.
Our examples should be viewed as evidence of well-engineered networks continuing to successfully carry traffic as critical equipment temporarily went offline.
In the following graphics, each dot represents the latency (y-axis) of a traceroute measurement over time (x-axis). We have colored each dot based on which major cities the measurement traversed — specifically when they crossed through New York City and when they went via an alternative path.
Level 3 (AS 3356)
The graphic on the right illustrates the shift in traffic heading from Chicago over Level 3′s backbone to networks in Europe. Just before midnight UTC on October 30th, or in the early evening last Monday in New York City, we see traffic stop traversing Level 3′s New York City-to-London hop and instead get folded into the ongoing traffic flowing across their Washington, DC-to-Paris hop.
Despite the temporary loss of Level 3′s New York City infrastructure, international Internet traffic continued to flow through the provider with roughly the same latencies.
Verizon (AS 701/AS 702)
For Internet traffic carried by Verizon heading to and through the United States from Europe, we also can observe traffic avoiding the stricken New York City area. In this example, traffic from London heading west switched briefly to Washington DC. The traffic shift resulted in a very minor change in the distribution of overall latencies and was likely imperceptible to the end users.
Tata (AS 6453)
In this example, Tata’s backbone network rerouted traffic bound for India from the east coast to the west coast of the US. This resulted in a small latency penalty,
which lasted only a couple of hours before traffic was again flowing through New York City.
Hurricane Electric (AS 6939)
The graphic on the right shows the distribution of latencies on Internet traffic originating from Stockholm, Sweden and traversing Hurricane Electric. Dots in the lower band primarily represent latency measurements to networks in North America, while those in the upper band depict networks in East Asia. You can think of the space between the bands as the Pacific Ocean. The Paris-to-Ashburn link took over as the London-to-New York City link dropped away. Latencies experienced a marginal increase, but traffic continued to flow and returned to normal after almost 24 hours.
NTT (AS 2914)
The graphic on the right illustrates latencies from Santiago, Chile to destinations around the world, as traffic traveled across NTT’s backbone network through New York City. Normally this traffic goes across a primary New York City-to-London link.
However, just after midnight UTC on Oct 30th (or 8pm Eastern Standard Time), this
link disappeared from our observations. It was primarily replaced with an Ashburn, VA-to-Frankfurt link and supplemented by a new and different New York City-to-London link. Of course, there is no direct cable from Ashburn-to-Frankfurt, but any intermediate hops are not visible in traceroutes. Nonetheless, similar to other Internet service providers, traffic clearly shifted away from New York City during the worst part of the storm.
In each of these examples, the automatic fail-overs that kept international Internet traffic flowing seamlessly was due entirely to excellent engineering, provisioning and configuration on the part of these providers.
So even with the loss of a core hub as critical as New York City, the continued flow of Internet traffic is a testament to the skill and hard work of the network engineers at these and other Internet providers.
Congratulations on a job well done!
Our weblog is written by a variety of Renesys employees. They run the gamut from senior execs and engineers to sales guys. Anyone who has something to say that could be informative or of interest to our customers and visitors, says it here.