August 2009 Archives

Staring Into The Gorge: Router Exploits

| 5 Comments

gorge.jpgI'm writing this blog entry from the campground at Vermont's beautiful Quechee Gorge, where I took the kids after work. Yes, Renesys is located smack in the middle of some of the nicest hiking, camping, and climbing on earth. No, you shouldn't move here, Northern New England has enough out-of-staters already, thanks. Unless, that is, you are an unusually talented web developer, have worked as a peering coordinator, or know the Internet transit industry inside-out, in which case you should send me your CV, posthaste. thanks, --jim





Here We Go Again.

Imagine an innocent BGP message, sent from a random small network service provider's border router somewhere in the world. It contains a payload that is unusual, but strictly speaking, conformant to protocol. Most of the routers in the world, when faced with such a message, pass it along. But a few have a bug that makes them drop sessions abruptly and reopen them, flooding their neighbors with full-table session resets every time they hear the offending message. The miracle of global BGP ensures that every vulnerable router on earth gets a peek at the offending message in under 30 seconds. The global routing infrastructure rings like a bell, as BGP update rates spike by orders of magnitude in the blink of an eye. Links congest. Small routing hardware falls over and dies. It takes hours for things to return to normal.

Internet connectivity is a good thing. Many of us depend on it for everything from our livelihoods to our entertainment. However, the Internet is very fragile and even the The New York Times is worried about it. But they're primarily concerned with overloads that can occur when everyone on the planet does the same thing at roughly the same time, such as surfing for news about Michael Jackson. Unfortunately, we will never avoid all such scenarios. Physical systems are designed around average and typical peak loads, not around extremely high loads associated with very unlikely events. Who would pay for that?

And this applies to other complex systems besides the Internet. I was in India during 9/11 and, for two days, I could not make a traditional phone call to the US. Why? Everyone in India knows someone in NYC, and they all picked up the phone at the same time to check in on them. The circuits were so overloaded, I couldn't even get the friendly "Your call cannot be completed as dialed" message.

No system is ever going to be engineered for insanely high loads. If everyone in your town decides to take a shortcut through your neighborhood to avoid an accident on the highway, you are going to have trouble getting out of your driveway. But rather than give up and wait it out, there is something you can do in advance and at reasonable cost: build a second driveway to a different street on the other side of your house, one that isn't fed by the same access roads from the highway. This blog is about building such redundancy into your Internet connectivity, so you aren't disconnected by a single failure. And while it's good that the New York Times and various governments are watching the problem, if your business depends on the Internet, you're largely on your own to audit and verify that you are buying a sufficient level of redundancy for your budget. A lot of fragility problems could be solved by more informed consumers performing the necessary due diligence.

About the Renesys Blog

Our weblog is written by a variety of Renesys employees. They run the gamut from senior execs and engineers to sales guys. Anyone who has something to say that could be informative or of interest to our customers and visitors, says it here.

About this Archive

This page is an archive of entries from August 2009 listed from newest to oldest.

June 2009 is the previous archive.

November 2009 is the next archive.

Find recent content on the main index or look in the archives to find all content.

Archives

Pages