Recently in Engineering Category
Across the Internet, yesterday, Google users twittered, blogged and emailed that Google search and mail were not usable. And, yesterday afternoon, on Google's official blog, Urs Hoelzle reported that Google "direct[ed] some [...] web traffic through Asia".
A couple of months ago, we discussed how a small Czech provider ended up causing global Internet mayhem by tickling a Cisco bug via a rather ridiculous routing announcement. While it's easy to fault the instigator of this meltdown, ultimate responsibility belongs with the vendors of poorly tested code. If we've learned anything in decades of software engineering, it is that you can't assume anything about user input. If you don't check that input for validity, you are not just being careless, you are creating a time bomb that will eventually go off. Another such bomb went off on Sunday, 3 May 2009, taking out a large swath of the Internet. We recount the sorry tale here.
Since Renesys maintains large quantities of data on the Internet going back many years, we sometimes get the question: If you guys are watching the entire 'net, why don't you just warn people when things break? My response is generally along the lines of: Sure we can do that. Simply tell us the correct state of the Internet at each moment in time and we'll alert you to any operational differences we observe. This is generally met with silence.
Renesys can tell you a lot about the current state of the Internet, but absolutely no one can tell you the correct state. And that is because no one is in charge, and so there is no central authoritative source of information. Think of the Internet as a highway system where anyone can buy a car and simply start driving: no need to register the car, attach a license plate, buy insurance or get a driver's license. You don't even have to show an id or be sober. Just pay some fees, buy some equipment, hook up and go. The barrier to entry really is that low.
Obviously, this arrangement can cause some problems. When Pakistan hijacked YouTube last year by announcing YouTube IP space, out of the hundreds of thousands of routing announcements seen on Internet, how was anyone to know this particular one was incorrect? Okay sure, you couldn't get your videos, but maybe YouTube had just opened a data center in Karachi and the problem was internal to them? Without some way of checking the authenticity of routes, the routers that direct traffic on the Internet simply believe what they are told. And if the best route to YouTube appears to be via Pakistan, then they are all going to use it, no questions asked. This is not a new problem, and this blog explores an old and largely failed attempt to address it. We then compare the differences between countries with respect to their routing hygiene.
This post is a follow-up to our blog last week about a small Czech provider briefly causing global Internet mayhem via a single errant routing announcement. In this incident, SuproNet (AS 47868) announced its one prefix, 94.125.216.0/21, to its backup provider, Sloane Park Property Trust (AS 29113), with an extremely long AS path. We've gotten more feedback about this entry than any other in recent memory, so we thought we'd try to answer some of the questions that were posed both here and elsewhere, as well as provide some clarification about exactly what went on. The questions we try to address include:
- How could anyone be this dumb?
- Why did this cascade throughout the planet?
- Can you provide more details about the impact and its spread?
- How do we prevent this from happening again?
Last August at DEFCON, Alex Pilosov and Tony Kapela presented a talk entitled Stealing the Internet: An Internet Scale Man-In-The-Middle Attack, which illustrated a technique for misdirecting specific Internet traffic via carefully constructed BGP routing messages. Using this approach, an attacker can redirect the incoming traffic of any victim through his own site for further inspection or alteration before ultimately passing it on to the victim. Furthermore, the attack can be carried out in a way that is largely transparent to the victim. Since this talk, Renesys staff have been repeatedly asked "So are people using this technique today?" That is, are people currently "stealing the Internet", and if so, who is attacking whom? Given the volume of routing data that Renesys has at our disposal and the number of tools we have to slice and dice it, we thought this would be a relatively straightforward question to answer. We were wrong.
Although we ultimately succeeded in answering the question and in developing a general Man-In-The-Middle (MITM) detection algorithm for the global Internet, we ended up writing a lot of code over the course of several months and burning through endless CPU cycles looking for attack evidence. Our results were presented this week at Black Hat and the complete presentation can be found here. In this blog, we'll hit on some of the highlights from the presentation.
This weekend, John Markoff wrote an interesting piece for the New York Times entitled Do We Need a New Internet? While his emphasis was largely on security, or rather the lack thereof, the central point Markoff makes is that the Internet may be so hopelessly broken that it could be better to start over, rather than continue to apply band-aids. As if to emphasize this point, SuproNet, a local Czech provider, single-handedly caused a global Internet meltdown for upwards of an hour today. SuproNet accomplished this feat by sending out a rather unusual routing update, one which a lot of routers did not handle very well. The result was Internet bedlam.
The end of the year is approaching which seems to be a harbinger of Internet disasters. Four years ago (on 24 Dec. 2004), TTNet significantly disrupted Internet traffic by leaking over 100,000 networks that were globally routed for about an hour. Two years ago (on 26 Dec. 2006), large earthquakes hit the Luzon Strait, south of Taiwan, severing several underwater cables and wreaking havoc on communications in the region. Last year there was a small delay. On 30 Jan. 2008, more underwater cables were severed in the Mediterranean, severely disrupting communications in the Middle East, Africa, and the Indian subcontinent.
Calamity returned to its customary end-of-year schedule this year, when early today (19 Dec. 2008) several communications cables were severed, affecting traffic in the Middle East and Indian subcontinent. According to a press release by France Telecom three major cables were damaged: Sea-Me-We 4 at 7:28 UTC, Sea-Me-We 3 at 7:33 UTC, and FLAG FEA at 8:06 UTC. It appears that the SMW3 cable was only partially cut, the SMW4 cable was completely cut, while the FLAG cable was "observed down" with no other information given. The location of the cut appears to be between Sicily and Tunisia in a section which is the responsibility of Egypt Telecom. The causes of the cut remained unclear. It seems that ships were deployed to repair the damaged cables, but no ETA was given.
