How to Catch when Proxies Lie

class: center,middle

# How to Catch when Proxies Lie
## <a href="https://research.owlfolio.org/pubs/2018-catch-proxies-lie.pdf">Verifying the Physical Locations of<br>Network Proxies with Active Geolocation</a>

.authors[
.agroup[
<a href="https://www.owlfolio.org/"><b>Zachary Weinberg</b></a> ·
<a href="https://www.cylab.cmu.edu/education/faculty/christin.html">Nicolas Christin</a> ·
<a href="https://users.ece.cmu.edu/~vsekar/">Vyas Sekar</a><br>
.affil[Carnegie Mellon University]
]
.agroup[
<a href="https://www3.cs.stonybrook.edu/~shicho/">Shinyoung Cho</a>
.affil[SUNY Stonybrook]
]
.agroup[
<a href="https://people.cs.umass.edu/~phillipa/">Phillipa Gill</a>
.affil[UMass-Amherst]
]
]

<span style="visibility:hidden">invisible <em>italic</em>
  and <strong>bold</strong> text to force fonts to load</span>

.institutions[
<a href="https://www.cylab.cmu.edu/" class="logo-left"><img
   src="images/cylab-logo.png"
   alt="Carnegie Mellon University CyLab: Security and Privacy Institute"></a>
<a href="https://iclab.org/" class="logo-left"><img
   src="images/iclab-logo.png"
   alt="Information Controls Lab"></a>
<a href="http://www.sigcomm.org/" class="logo-right"><img
   src="images/sigcomm-logo.png"
   alt="SIGCOMM"></a>
<a href="https://www.acm.org/" class="logo-right"><img
   src="images/acm-logo.png"
   alt="Association for Computing Machinery"></a>
]

???
Hello everyone.  I’m going to talk about what you can do when you
suspect your VPN servers aren’t where the VPN company says they are.

I’m a PhD student at Carnegie Mellon’s CyLab.  This is joint work with
two of the CyLab faculty, Nicolas Christin and Vyas Sekar, and also
with Shinyoung Cho at SUNY Stonybrook and Phillipa Gill at the
University of Massachussetts.

---
# Implausible claims

???
This is a verbatim quote from a major commercial VPN service’s
website. They claim to have servers in “190+ countries” and this is
their list just for Asia and the Pacific.

I marked in red several countries that seem really unlikely.  North
Korea for obvious political reasons and the rest of them because
they’re tiny islands with fewer than 5000 inhabitants.

Once you notice this, you wonder, do we have any reason to believe
_any_ of these locations are true?

(By the way, whenever I say “country” in this talk, I mean a region
with its own ISO 3166 country code.  That includes both sovereign
states and dependent territories.)

---
# Implausible claims, audited

Claim 218 countries  
No more than 40 true countries

???
To spoil my own punch line, this map shows all the countries where
that service said they have servers, and which of those claims are
true.  Green is true, orange is false, light tan is not claimed in the
first place.  They said they have 218 countries but the servers are
really in less than 40 countries.

The rest of this talk is about how we know that.  I’m going to show
you how you _can_ locate a server anywhere in the world, without
trusting operator claims or IP-to-location databases.  Then I’m going
to show you how to apply that technique to VPN servers specifically,
and then I’ll come back to this, and the same for six other providers,
and what it means.

---
# Active geolocation

<table class="twocoltable">
  <tr><td><img src="images/ag-cbg-map.svg" style="width:100%"></td>
      <td style="vertical-align:bottom"><img src="images/ag-cbg-plot.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">Same principle as GPS,<br>but use packet
  round-<br>trip time (RTT)</td><td class="ctext">CBG: Linear estimate
  of maximum packet travel</td></tr>
</table>

???

How can we find out where proxy servers really are, without trusting
any information that could be faked?

The basic idea is called active geolocation.  It works on the same
principle as the Global Positioning System, but instead of radio waves
we use ping packets.  People have been studying how to do this for
more than twenty years; one of the simplest techniques is called CBG,
Constraint-Based Geolocation.  It goes like this: We have _landmark_
hosts in, say, France, the UK, and Denmark, we ping the _target_ host
from each, we assume the relationship between travel time and travel
distance is linear and we find out it can be only so many kilometers
from each, we draw disks on the map and we find out it’s gotta be in
Belgium.  Or maybe a couple places in southeastern England.  We’re
assuming it’s not on a disused anti-aircraft platform in this wedge of
the North Sea here.

The problem is, radio waves travel in straight lines at a constant
velocity, but packets don’t.  There’s always routing delays, and also
“circuitous” routes, major detours from the great-circle
distance—often packets get routed from Australia to Japan by way of
California, because that’s how the peering goes.  That’s 21 thousand
kilometers’ worth of extra latency.

On the right I’ve plotted the relationship between delay and distance
for pings from one of the landmarks we used to all of the others, and
you can see there _is_ a relationship but it’s messy.  The black line
is CBG’s linear speed estimate, it’s as steep as possible without
going above any of the points.  There are much fancier models in the
literature.

---
# (Quasi-)Octant

<table class="twocoltable">
  <tr><td><img src="images/ag-oct-map.svg" style="width:100%"></td>
      <td style="vertical-align:bottom"><img src="images/ag-oct-plot.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">Minimum as well as<br>maximum distance</td><td class="ctext">Piecewise-linear travel<br>time estimate, using<br>convex hull of points</td></tr>
</table>

???

For instance, Octant uses piecewise linear models, based on the convex
hull of the points on the scatterplot, to estimate the minimum as well
as the maximum travel distance.  Rings instead of disks.  In this
example, that lets it rule out England.

Octant also does things with hop-by-hop travel times for some
additional accuracy, but we had to remove that part of the algorithm
because we couldn’t collect traceroutes through most of the VPN
servers, they black-hole ICMP time exceeded packets.  I’m going to be
calling our implementation “Quasi-Octant” from now on because of that.

---
# Spotter

<table class="twocoltable">
  <tr><td><img src="images/ag-spo-map.svg" style="width:100%"></td>
      <td style="vertical-align:bottom"><img src="images/ag-spo-plot.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">Probabilistic combination<br>of Gaussian rings</td>
      <td class="ctext">Cubic polynomial<br>estimates of <i>μ</i> and <i>σ</i></td></tr>
</table>

???

And Spotter draws probability density functions instead of flat shapes
on the map, based on cubic polynomial regression on the delay-distance
relationship.  Comes out basically the same in this example.

The papers describing the fancier algorithms often compare back to CBG
and claim some percentage reduction in the uncertainty of the
estimate, over the same test set.  But the catch is they’re all tested
on North America or Europe, and often only on PlanetLab nodes, which
may have better connectivity than average for that area.

There’s reports that Octant’s minimum distance estimates are unsound
in China, because its network is always congested, so it’s not safe to
say “this packet must have traveled at least this distance.”

And hardly anyone has tested active geolocation on hosts that could be
anywhere in the world.

---
# Testing active geolocation around the world

<table class="pushdown2 twocoltable">
  <tr><td><img src="images/atlas.svg" style="width:100%"></td>
      <td><img src="images/pop-density.png" style="width:100%"></td></tr>
  <tr><td class="ctext">RIPE Atlas anchors<br>and stable probes</td>
      <td class="ctext">Global population density<br>as of 2015</td></tr>
  <tr><td class="ctext tinytext"><a href="https://atlas.ripe.net">https://atlas.ripe.net</a></td>
      <td class="ctext tinytext">GPWv4, CIESIN/SEDAC<br><a href="http://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev10/maps">http://sedac.ciesin.columbia.edu/data/set/gpw-v4-population-density-rev10/maps</td></tr>
</table>

???

So we did that test.  We measured the accuracy of CBG, Quasi-Octant,
Spotter, and a hybrid—cubic regression but geometric intersection—on
test targets all around the world.

We used landmarks from the RIPE Atlas measurement constellation.  They
have two classes of measurement hosts, anchors and probes.  Anchors
work better as landmarks, mainly because they’re guaranteed to have
stable stable IP addresses, but you can also use probes if you’re
careful.  There’s about 300 overall.  We don’t use them all for every
measurement, but that’s just a performance hack.

Their coverage outside of Europe could be better.  For comparison, on
the right is an estimate of world population density as of 2015.  Even
if you scale that by Internet access, there’s still a huge
discrepancy.  There are measurement constellations with more hosts in
North America, like CAIDA Ark, but I haven’t found one with many more
hosts in Latin America, Africa, or Asia.

But there’s enough worldwide coverage to make this worth trying, at least.

---
# Testing active geolocation around the world

<table class="pushdown2 twocoltable">
  <tr><td><img src="images/atlas.svg" style="width:100%"></td>
      <td><img src="images/crowd.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">RIPE Atlas anchors<br>and stable probes</td>
      <td class="ctext">Crowdsourced test hosts<br>(40 volunteer, 150 MTurk)</td></tr>
</table>

???

We calibrate all our algorithms on ping times from landmarks to
landmarks, so we need a second set of hosts to be testing targets.
We crowdsourced these.  40 from volunteers, 150 paid workers from
Amazon’s Mechanical Turk micro-task service.

I was complaining about RIPE Atlas not having enough hosts in Latin
America, Africa, and Asia, but it’s hard for a researcher based in the
USA to get volunteers from there, too.  Mechanical Turk lets you
request workers from specific countries, which we used to prevent India
and the USA consuming my whole budget for this, but in many countries
we didn’t get any workers at all.

But, again, there’s enough to tell us something.

---
# Measuring RTT with a Web app

???

We couldn’t measure round-trip times with ordinary ping packets,
because the proxies we ultimately want to investigate are behind
aggressive ingress filters.  Also, we couldn’t ask our volunteers or
MTurk workers to download, compile, and run a command line program, so
we had to do those measurements with a Web application, which has lots
of restrictions on how it can access the network.

The short version is, we have to use TCP handshakes, on a well-known
port, and when we are using a Web application we can’t be sure whether
we’re measuring one round-trip or two, which means the distance
estimates are randomly twice as big as they should be.  If the
browser’s running on Windows it can even be three or four round trips,
I don’t know why.

But this is a useful problem to have, because it’s giving us
unpredictable extra latency, which is the same thing we have to worry
about because of congested regional networks and circuitous routes.

---
# Algorithm comparison

Algorithms using minimum distance estimates  
do not cope with extra latency

???

So we didn’t try to compensate for the extra round-trips at all.  We
tested unmodified CBG, Quasi-Octant, Spotter, and Hybrid on the
crowdsourced data and here’s how well they did.  The most important
criterion is on the left.  We want the true location always to be
_inside_ the prediction region.  None of the algorithms managed that,
and the left plot shows how badly they failed.  How far away was the
edge of the prediction from the true location?  Turns out the simplest
algorithm, CBG, is least likely to fail this way.

Then we look at _why_ they’re failing with the other two plots, and
what we find is, the problem is minimum distance estimates.
Quasi-Octant, Spotter, and Hybrid are all failing because they assume
the packets have to have gotten some distance away in a given time,
but that’s not true because of all the systematic errors.

---
# Avoiding underestimation

<table class="twocoltable">
  <tr><td><img src="images/underestimate-map.svg" style="width:100%"></td>
      <td style="vertical-align:middle"><img src="images/underestimate-plot.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">Underestimating travel<br>distance can cause<br>empty prediction</td><td class="ctext">Underestimates observed<br>for ~1% of all disks</td></tr>
</table>

???

CBG only fails when some of its disks _underestimate_ the distance.
Here’s the example from the beginning again, if we add this pink disk,
which is an underestimate, we get no prediction region at all because
there’s no overlap among all four disks.  If it were just a little bit
bigger, but still too small, we’d think the server had to be farther
to the southeast than it really is.

Since we know the true locations of all the crowdsourced test hosts,
we can calculate how often underestimates happen.  It comes out to
about 1% of all the disks.  Anything to the left of this red line.

So we make two changes to CBG.  We know it’s physically possible for a
packet to go twenty thousand kilometers in 240 milliseconds, using a
communications satellite, so we say that CBG’s speed estimate has to
be at least fast enough to allow that.  And, if there’s no overlap
among all the disks, we discard down to the largest subset that does
have an overlap.  In this example, there’s two possibilities, so we
take the bigger overlap, which means throwing out the pink disk and
giving the same answer we originally had.

We retested and sure enough, those two changes eliminated all of the
misses.  So we used this modified CBG for the main study, geolocating
VPN proxies.

---
# Seven VPN providers

.caption[
  VPN commercial landscape data collected by [VPN.com](https://www.vpn.com)
]

???

I’m not going to name the VPN companies we tested, because there’s
many more companies we haven’t tested.  I don’t want you to think the
companies in this study are unusually misleading about their
advertised server locations.  I suspect this is an industry wide
problem, and if we tested all of the companies we could find, we’d
discover at least some falsehoods for most of them.

But what I will tell you is that this slide shows 157 VPN providers,
with the set of countries that they advertise servers in for each, and
the lettered ones are the ones we tested.  The data comes from the
comparison site VPN dot com.  Providers A through E are all in the top
20 by number of countries advertised, and F and G are much more
typical.

---
# Location databases agree with providers…

???

We checked the providers’ claims against five major IP-to-location
databases and you can see that the databases mostly agree with them.
Eighty percent agreement or better for most.  IP2Location and IPInfo
are down around 50% for provider A, which is curious, but it could
just mean they’re out of date.

We’ve all heard that IP-to-location databases are notoriously full of
errors, but also, a lot of the sources they use could be faked pretty
easily.  Whois, address registry allocations, airport codes in
routers’ DNS names, that sort of thing.  So, suppose a VPN company has
a way to fake server locations in IP-to-location databases.  Most of
their customers, what they probably want is for websites to _think_
they’re surfing from Ruritania.  They want to watch Ruritanian TV but
the website will only stream TV to people in the country.  And the
website enforces that by looking up client addresses in one of these
databases, so if they fake their server locations, they can give their
customers what they want, without needing to have servers in lots of
different countries.  Saves money.  Economically rational.

But maybe you’re not subscribing to VPNs to watch TV.  Maybe you have
a reason why you really need your packets routed through Ruritania.
Then a faked server location is no good to you.

---
# Measurement through VPN servers

<table class="twocoltable">
  <tr><td><img src="images/through-proxy-map.svg" style="width:100%"></td>
      <td style="vertical-align:middle"><img src="images/through-proxy-plot.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">Cannot measure <i>A</i><br>Can measure <i>B</i> and <i>C</i></td>
    <td class="ctext"><i>A</i> = <i>B</i> − 0.49<i>C</i></td></tr>
</table>

???

I need to go over a couple more technical wrinkles before we get to
the results.  We’re not using the Web app for the VPN servers, we’re
using a command-line program that can reliably measure a single round
trip time.  But it’s the wrong round trip.

To geolocate VPN servers, we need the round trip time between the
server and each landmark.  That’s _A_ on the left diagram.  But we
can’t measure that directly, because we can’t run code on the server
itself, and most of them don’t respond to pings.  We _can_ measure
_B_, the round-trip time from our client _through_ the server to each
landmark, and also _C_, the round-trip time from our client through
the server and back to the client and then back again.

In an ideal world, _A_ would be equal to _B_ minus half of _C_,
because _C_ goes back and forth between the client and the server
twice.  A few of the servers _can_ be pinged, so we use that to check
this equation, and it holds up: linear regression says 0.49_C_ with
R-squared greater than point nine nine.

---
# Disambiguation with external knowledge

<table class="twocoltable">
  <tr><td><img src="images/disambig-net.svg" style="width:100%"></td>
      <td><img src="images/disambig-dc.svg" style="width:100%"></td></tr>
  <tr><td class="ctext">All these targets belong<br>to the same AS and /24</td>
    <td class="ctext">All the data centers inside<br>the oval are in Chile</td></tr>
</table>

???

Also, sometimes, CBG by itself gives us an ambiguous answer but we can
resolve it with outside information.  For instance, if we have a group
of servers whose IP adddresses all belong to the same Autonomous
System and the same /24, probably they’re all in the same location.
If we get some prediction regions that cross a national border and
some that don’t, for a group like that, we assume the crossings are a
mistake.  In this example on the left, we say all these servers are in
Canada and not the USA.  This is often useful when a big city is near
a border, like Toronto here, and for small countries like Singapore or
even Belgium, where the prediction has to be really tight not to cross
into any neighbors.

Also, we’re geolocating _servers_.  Servers live in data centers.  For
instance, all the data centers inside the prediction region on the
right are in Chile, not Argentina, so we can assume that this proxy is
in Chile.

Incidentally, all the providers use DNS-based load-balancing, so we
look up all the IP addresses for their servers and test each one
independently.  Those 20 servers on the left correspond to five DNS
names all belonging to one provider.

---
# Provider A

Claim 218 countries  
No more than 40 true countries

???

So now we come back to this slide I showed you at the beginning.
Provider A claims to have VPN servers in 218 countries.  Almost every
ISO country code there is.  They’re missing a few places in Africa and
South America.  Nobody we tested says they have a server in
Antarctica, by the way.

The green countries are the ones where they really do have servers,
and the orange countries are the ones where they said they do, but
they don’t.  Almost nothing is really in South America, or Africa, or
Central Asia, or Oceania.  Fewer than advertised in several other
places.  And this isn’t just a matter of its being difficult to
operate servers in certain locations.  There would be no problem
getting hosting in Norway, or New Zealand, or Egypt, or Argentina, but
they don’t; conversely, getting hosting in Russia is a hassle, but
they have.

I can’t show it to you on this map, but there is very little
relationship between the claimed location and the actual location.
Claimed locations from all over the world turn out to be concentrated
into data centers in Florida, the UK, and the Czech Republic.

---
# Provider B

Claim 109 countries  
No more than 30 true countries

???

Provider B isn’t making claims quite as grandiose as A, but there’s
still quite a lot of lies, especially relating to South America,
Africa, and Central Asia.

---
# Provider C

Claim 84 countries  
No more than 50 true countries

???

I’m going to go quickly through the rest of these, the overall
patterns are much the same.

I can’t show it on this map, but this provider had servers that were
supposed to be in the USA but we measure them being in Saudi Arabia,
Iran, and China, which is precisely backward from what you would
expect.  If you’re going to go to the trouble of getting data center
space in those countries, why wouldn’t you advertise it?

---
# Provider D

Claim 52 countries  
No more than 45 true countries

???

This provider’s servers are slow and overloaded, which makes all the
predictions come out more uncertain.  It’s possible that they have
fewer countries than this says they do.

---
# Provider E

Claim 53 countries  
No more than 35 true countries

???

Nobody seems to want to put servers in southeastern Europe, which
seems odd to me.  But hey, at least these people aren’t lying about
Italy!

---
# Provider F

Claim 19 countries  
Could be as many as 31 countries

???

This provider also has slow, overloaded servers producing position
uncertainty and rendering us unable to tell how much lying is going
on.  But at least we know that we don’t know.

---
# Provider G

Claim 20 countries  
No more than 18 true countries

???
The servers this provider said were in France and Italy are actually
in Germany.  A hundred years ago someone might have started a war over
that.

---
# Summary

Dishonest claims are more likely to occur in the “long tail” of countries.

???

To sum up, provider claims are fully credible for a little less than
half of the tested IP addresses, and _could_ be true for nearly
two-thirds. But which countries account for the bulk of the credible
claims?  USA, Australia, UK, Netherlands, Germany, Canada, France, and
so on.  The places where bandwidth is cheap and business is easy to
do.  The dishonesty happens in the long tail of countries — not by
population, but by ease of access to hosting.

There’s some odd exceptions.  I don’t know why they tend to lie about
Sweden and tell the truth about Russia.

---
# Either we’re wrong or the databases are

???

Now let’s look again at the provider claims and the location
databases, adding some rows at the bottom for how much _we_ agree with
the claims.

Depending on how much we give the providers the benefit of the doubt,
we agree with their claims anywhere from 30% to 90% of the time.

Except for provider D, we always agree less than the
databases do.  (The maps I showed match the “generous” row.)

Either the databases are wrong or I’m wrong, and I’m pretty sure I’m
not wrong.

---
# Questions raised

* Is other research using VPNs invalidated?
* How easy is it to fake IP-to-location records?
* What if the VPN actively interferes with these measurements?
* What do people think they’re buying?
* Should Web apps be able to measure precise network timing?

???

Obviously this is not the last word on VPN server locations, there are
plenty of ways our results could be improved.  I want to end, though,
with some questions raised by just the work so far.

We started this project because we weren’t seeing known cases of
Internet censorship, through Provider A’s VPNs.  I wonder if anyone
else may have done measurement studies using these providers and
didn’t measure what they thought they were measuring.

How easy _is_ it to tamper with IP-to-location databases, the way we
think they’re doing?  We know there’s tons of _errors_ in
IP-to-location databases, but I haven’t seen anyone looking for active
falsification.

Some studies say, if the target delays its responses to pings, it can
foul all the distance estimates.  With the measurements we’re doing,
the VPN could be even more aggressive than that, and respond early to
some of our SYNs.  Is there anything we could do to prevent that kind
of interference?  I can think of one way, but it needs a custom
protocol and synchronized clocks on all the landmarks…

On a policy note, to know if this is a clear or a fuzzy case of false
advertising, we need to understand what VPN customers think they’re
buying; if it’s just access to Ruritanian streaming TV or if they
truly expect their packets to get routed through Ruritania.

And finally, remember I said we’d built a Web application that runs an
active geolocation measurement?  That could be used by a malicious
website to locate a human without their permission.  Maybe Web apps
shouldn’t be allowed to measure precise network timings.