Recently in Engineering Category

Staring Into The Gorge: Router Exploits

| 5 Comments

gorge.jpgI'm writing this blog entry from the campground at Vermont's beautiful Quechee Gorge, where I took the kids after work. Yes, Renesys is located smack in the middle of some of the nicest hiking, camping, and climbing on earth. No, you shouldn't move here, Northern New England has enough out-of-staters already, thanks. Unless, that is, you are an unusually talented web developer, have worked as a peering coordinator, or know the Internet transit industry inside-out, in which case you should send me your CV, posthaste. thanks, --jim





Here We Go Again.

Imagine an innocent BGP message, sent from a random small network service provider's border router somewhere in the world. It contains a payload that is unusual, but strictly speaking, conformant to protocol. Most of the routers in the world, when faced with such a message, pass it along. But a few have a bug that makes them drop sessions abruptly and reopen them, flooding their neighbors with full-table session resets every time they hear the offending message. The miracle of global BGP ensures that every vulnerable router on earth gets a peek at the offending message in under 30 seconds. The global routing infrastructure rings like a bell, as BGP update rates spike by orders of magnitude in the blink of an eye. Links congest. Small routing hardware falls over and dies. It takes hours for things to return to normal.

Internet connectivity is a good thing. Many of us depend on it for everything from our livelihoods to our entertainment. However, the Internet is very fragile and even the The New York Times is worried about it. But they're primarily concerned with overloads that can occur when everyone on the planet does the same thing at roughly the same time, such as surfing for news about Michael Jackson. Unfortunately, we will never avoid all such scenarios. Physical systems are designed around average and typical peak loads, not around extremely high loads associated with very unlikely events. Who would pay for that?

And this applies to other complex systems besides the Internet. I was in India during 9/11 and, for two days, I could not make a traditional phone call to the US. Why? Everyone in India knows someone in NYC, and they all picked up the phone at the same time to check in on them. The circuits were so overloaded, I couldn't even get the friendly "Your call cannot be completed as dialed" message.

No system is ever going to be engineered for insanely high loads. If everyone in your town decides to take a shortcut through your neighborhood to avoid an accident on the highway, you are going to have trouble getting out of your driveway. But rather than give up and wait it out, there is something you can do in advance and at reasonable cost: build a second driveway to a different street on the other side of your house, one that isn't fed by the same access roads from the highway. This blog is about building such redundancy into your Internet connectivity, so you aren't disconnected by a single failure. And while it's good that the New York Times and various governments are watching the problem, if your business depends on the Internet, you're largely on your own to audit and verify that you are buying a sufficient level of redundancy for your budget. A lot of fragility problems could be solved by more informed consumers performing the necessary due diligence.

Strange Changes in Iranian Transit

| 15 Comments | 5 TrackBacks
Many media sources have reported outages in Iranian mobile networks and Internet services in the wake of Friday's controversial elections. We took a look at the state of Iranian Internet transit, as seen in the aggregated global routing tables, and found that the story is not as clear-cut as has been reported.

Across the Internet, yesterday, Google users twittered, blogged and emailed that Google search and mail were not usable. And, yesterday afternoon, on Google's official blog, Urs Hoelzle reported that Google "direct[ed] some [...] web traffic through Asia".

A couple of months ago, we discussed how a small Czech provider ended up causing global Internet mayhem by tickling a Cisco bug via a rather ridiculous routing announcement. While it's easy to fault the instigator of this meltdown, ultimate responsibility belongs with the vendors of poorly tested code. If we've learned anything in decades of software engineering, it is that you can't assume anything about user input. If you don't check that input for validity, you are not just being careless, you are creating a time bomb that will eventually go off. Another such bomb went off on Sunday, 3 May 2009, taking out a large swath of the Internet. We recount the sorry tale here.

Since Renesys maintains large quantities of data on the Internet going back many years, we sometimes get the question: If you guys are watching the entire 'net, why don't you just warn people when things break? My response is generally along the lines of: Sure we can do that. Simply tell us the correct state of the Internet at each moment in time and we'll alert you to any operational differences we observe. This is generally met with silence.

Renesys can tell you a lot about the current state of the Internet, but absolutely no one can tell you the correct state. And that is because no one is in charge, and so there is no central authoritative source of information. Think of the Internet as a highway system where anyone can buy a car and simply start driving: no need to register the car, attach a license plate, buy insurance or get a driver's license. You don't even have to show an id or be sober. Just pay some fees, buy some equipment, hook up and go. The barrier to entry really is that low.

Obviously, this arrangement can cause some problems. When Pakistan hijacked YouTube last year by announcing YouTube IP space, out of the hundreds of thousands of routing announcements seen on Internet, how was anyone to know this particular one was incorrect? Okay sure, you couldn't get your videos, but maybe YouTube had just opened a data center in Karachi and the problem was internal to them? Without some way of checking the authenticity of routes, the routers that direct traffic on the Internet simply believe what they are told. And if the best route to YouTube appears to be via Pakistan, then they are all going to use it, no questions asked. This is not a new problem, and this blog explores an old and largely failed attempt to address it. We then compare the differences between countries with respect to their routing hygiene.

This post is a follow-up to our blog last week about a small Czech provider briefly causing global Internet mayhem via a single errant routing announcement. In this incident, SuproNet (AS 47868) announced its one prefix, 94.125.216.0/21, to its backup provider, Sloane Park Property Trust (AS 29113), with an extremely long AS path. We've gotten more feedback about this entry than any other in recent memory, so we thought we'd try to answer some of the questions that were posed both here and elsewhere, as well as provide some clarification about exactly what went on. The questions we try to address include:

  • How could anyone be this dumb?
  • Why did this cascade throughout the planet?
  • Can you provide more details about the impact and its spread?
  • How do we prevent this from happening again?

To Catch a Thief

| 3 Comments | 1 TrackBack

Last August at DEFCON, Alex Pilosov and Tony Kapela presented a talk entitled Stealing the Internet: An Internet Scale Man-In-The-Middle Attack, which illustrated a technique for misdirecting specific Internet traffic via carefully constructed BGP routing messages. Using this approach, an attacker can redirect the incoming traffic of any victim through his own site for further inspection or alteration before ultimately passing it on to the victim. Furthermore, the attack can be carried out in a way that is largely transparent to the victim. Since this talk, Renesys staff have been repeatedly asked "So are people using this technique today?" That is, are people currently "stealing the Internet", and if so, who is attacking whom? Given the volume of routing data that Renesys has at our disposal and the number of tools we have to slice and dice it, we thought this would be a relatively straightforward question to answer. We were wrong.

Although we ultimately succeeded in answering the question and in developing a general Man-In-The-Middle (MITM) detection algorithm for the global Internet, we ended up writing a lot of code over the course of several months and burning through endless CPU cycles looking for attack evidence. Our results were presented this week at Black Hat and the complete presentation can be found here. In this blog, we'll hit on some of the highlights from the presentation.

About the Renesys Blog

Our weblog is written by a variety of Renesys employees. They run the gamut from senior execs and engineers to sales guys. Anyone who has something to say that could be informative or of interest to our customers and visitors, says it here.

About this Archive

This page is an archive of recent entries in the Engineering category.

Economics is the previous category.

Governance is the next category.

Find recent content on the main index or look in the archives to find all content.

Archives

Pages