War story: RPKI is working as intended

03/12/2024

War story: RPKI is working as intended

Written by Job Snijders, Principal Software Engineer

Originally published in Fastly blog

To be very forward, this really is a story about something that turned out to be no problem at all. But sometimes boring stories deserve to be told. To provide context for this one, we have to go back to February 2008. Back then – through no fault of their own- one of the world’s most popular video-sharing platforms suffered a disastrous multi-hour outage, interrupting millions of video viewings. The impact was so significant that even mainstream media reported extensively on what was essentially an arcane routing incident. But, nowadays we’re hearing less and less about incidents like these, even though the Internet is bigger than ever. Three weeks ago Fastly was the target of a BGP hijack, similar to what happened in 2008, but this time barely anyone noticed. Why is that? Something has changed. In this article, I’ll delve into one of the Internet’s most remarkable, yet untold, success stories.

A crash course on how Internet routing works

At its core, the Internet is a backbone spanning hundreds of thousands of interconnected routers managed by roughly 85,000 organizations to deliver data to millions of digital destinations. To establish what part of the Internet is attached where — what direction to send data packets to reach a given Internet destination (an IP address) — all these routers exchange messages with each other using an industry-standard protocol format called BGP. The totality of this whooshing exchange of routing information oftentimes is referred to as the global Internet routing system.

Internet Map by The Opte Project – Originally from the English Wikipedia, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=1538544

One of the key factors for routers to decide which of many paths to use for sending data is the Longest Prefix Match (LPM) algorithm. In a nutshell: more detailed information about a destination is preferred over less granular information. Think of punching into your car’s navigation system your destination’s street and city versus inputting only the city name. Both approaches will bring you closer to your destination, but of course, being more specific is likely to result in a better route. Put differently, the Internet would not work without LPM.

A major contributor to the Internet’s amazing year-to-year growth is that basically anyone can easily connect to it and almost immediately start sending and receiving data. You hook your router up to neighboring routers from other organizations and then use BGP to send a message into the routing system. In doing so, you tell the Internet that your IP addresses are now reachable via a specified “nexthop”. The corollary is that the most obvious vulnerability in the routing system is unauthorized origination of routes to IP addresses. More on that thorny aspect in the next section!

Additional reading:

What happened in 2008?

A large nation-state’s incumbent telecommunications operator was instructed to censor a popular video-sharing platform within its national borders. Of the various mechanisms to block access to a particular internet service, BGP is one of the simpler (albeit blunter) ways to blackhole undesired traffic. In the course of normal network operations, not every BGP message is intended or expected to be distributed into the global system. A network operator might intend for some BGP messages to only be distributed to its own routers for its own private purposes, constraining the scope to its own administrative domain.

(Free access, no subscription required)

Unfortunately — due to a configuration mistake — the BGP messages intended to comply with the country’s censorship order were also passed on to adjacent networks outside of the country, who, in turn, distributed them to their adjacent networks, and so on. In the blink of an eye, routers around the world received BGP messages that a specific set of the video platform’s IP addresses (remember the LPM algorithm!) were now being served from infrastructure in Pakistan. As that wasn’t at all where the video platform was actually attached, Internet data packets ended up being dropped on the floor, globally disrupting this video platform’s online presence. RIPE NCC did a good write-up on the technical details and NY TimesCNETArs Technica, and NBC News also covered the incident.

Fast forward to 2024

A very similar routing incident happened to Fastly just last week, but this time around no headlines were made. While this incident would’ve severely affected Fastly a few years ago, this time the impact was negligible. What gives? While the specific players and motivations differ from the famous 2008 incident, at its heart, the technical details were the same. In this more recent case, the state incumbent of another large nation generated BGP messages hijacking some of Fastly’s IP address space for the purpose of disrupting Internet traffic. What makes now different from then?

RPKI improves the routing system’s reliability

The big difference between 2008 and 2024 is that nowadays the Internet industry uses a cryptographically verifiable mechanism called RPKI to assess plausibility of BGP messages in a fully automated fashion. The RPKI is a distributed database through which networks can publish their routing intentions in Route Origin Authorizations (ROAs), in turn enabling other networks to validate BGP messages against this database using a service called Route Origin Validation (ROV). By rejecting messages that fail this validation, the RPKI-invalid routes can be kept out of circulation, limiting their ability to cause disruption.

The views expressed by the authors of this blog are their own and do not necessarily reflect the views of LACNIC.

Subscribe
Notify of

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments