StegoTorus:
A Camouflage Proxy for the
Tor Anonymity System

Zachary Weinberg · Jeffrey Wang
Vinod Yegneswaran · Linda Briesemeister
Steven Cheung · Frank Wang · Dan Boneh

Carnegie Mellon University: CyLabSRI InternationalStanford UniversityThe Tor Project

Problem

Directly connecting users from Iran

Data from the Tor Project — https://metrics.torproject.org

Tor and Censorship

The Onion Router (Tor) conceals the source and destination of traffic by relaying it through a chain of proxies.
It does not conceal that you are using Tor.

Tor and Censorship

Any of these routers could detect the use of Tor.

Tor and Censorship

The censor, an adversary in this position,
is motivated to prevent Tor from being used.

StegoTorus

Disguise Tor traffic as an innocuous cover protocol
so the censor does not detect and block it.

Stegotorus’ Mission

Protect bulk traffic from deep-packet inspection and blockade

… hide it, steganographically, in common Internet protocols

… obscure packet contents, size, and timing

… maintain Tor’s anonymity properties and performance

About steganography

Hide a hiddentext in a covertext; an adversary shouldn’t be able to tell that the hiddentext is present

The covertext conforms to some standard file format
(in our case, a standard TCP protocol)

State of the art is weaker than for cryptography

DEFIANCE: the larger system

Limits on the Censor

StegoTorus Architecture

Chopping

The Tor stream is not a good hidden protocol as is

Chopping reformats the stream to solve these issues

Challenge 1: avoiding plaintext headers

Initial Handshake

Challenge 2: Short cover connections

Steganography

Implemented:

Planned:

Arbitrary encrypted packet stream

17 03 01 00 6B XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
MM MM MM MM MM MM MM MM  MM MM MM MM MM MM MM MM
MM MM MM MM PP

A TLS 1.0 application-data record, with 107 bytes of payload, a 20-byte MAC,
and a 16-byte block cipher,
on the wire looks like this →

Arbitrary encrypted packet stream

17 03 01 00 6B XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
XX XX XX XX XX XX XX XX  XX XX XX XX XX XX XX XX
MM MM MM MM MM MM MM MM  MM MM MM MM MM MM MM MM
MM MM MM MM PP

A TLS 1.0 application-data record, with 107 bytes of payload, a 20-byte MAC,
and a 16-byte block cipher,
on the wire looks like this →

Replace XX, MM, PP bytes with chopper output (adversary can’t check the MAC)

Works from labeled packet captures of TLS streams

Can be adapted to any TCP protocol carrying encrypted data

HTTP client → server

GET /<uri> HTTP/1.1
Accept: text/html,application/xhtml+xml,
    application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip, deflate
Accept-Language: en-us,en;q=0.5
Connection: keep-alive
Host: <hostname>
User-Agent: Mozilla/5.0 (X11; Linux x86_64;
    rv:10.0) Gecko/20100101 Firefox/10.0
Cookie: <cookie>

Not much room to smuggle arbitrary data

We use URIs and cookies, base64’d
(todo: make them look less fishy)

Hostname impractical (has to be in the DNS)

HTTP server → client: JavaScript

(function(a,b){function cy(a){return f.isWindow(a)?a:
a.nodeType===9?a.defaultView||a.parentWindow:!1}funct
ion cv(a){if(!ck[a]){var b=c.body,d=f("<"+a+">").appe
ndTo(b),e=d.css("display");d.remove();if(e==="none"||
e===""){cl||(cl=c.createElement("iframe"),cl.frameBor
der=cl.width=cl.height=0),b.appendChild(cl); if(!cm||
!cl.createElement)cm=(cl.contentWindow||cl.contentDoc
ument).document,cm.write((c.compatMode==="CSS1Compat"
?"<!doctype html>":"")+"<html><body>"),cm.close();d=c
m.createElement(a),cm.body.appendChild(d),e=f.css(d,"
display"),b.removeChild(cl)}ck[a]=e}return ck[a]}func
tion cu(a,b){var c={};f.each(cq.concat.apply([],cq.sl
ice(0,b)),function(){c[this]=a});return c}function ct
(){cr=b}function cs(){setTimeout(ct,0);return cr=f.no
w()}function cj(){try{return new a.ActiveXObject("Mic
rosoft.XMLHTTP")}catch(b){}}function ci(){try{return
new a.XMLHttpRequest}catch(b){}}

JavaScript on the wire looks like this →

Overwrite identifiers with hiddentext, encoded in modified base64

HTTP server → client: JavaScript

(function(a,b){function cy(a){return f.iFBEg__S(a)?a:
a.nLL5K5Wi===9?a.db9pYVlj2x_||a.pjgALQ96LcyO:!1}funct
ion cr(a){if(!cQ[a]){var b=c.bc3B,d=f("<"+a+">").axb3
G9Tt(b),e=d.cXk("dXKHE2w");d.rIASMb();if(e==="n2Ee"||
e===""){cO||(c5=c.cN1DbOy6nqtEC("iuuLEs"),cO.fa61AM_r
8jS=cR.woPoZ=cW.hhWBrU=0),b.aJGbdVaYlk8(cC); if(!c2||
!c9.c1fwWKhvnD6_c)c$=(cZ.cZH$L2wDJNHLw||cN.cvmE_b4U5S
gSSuD).d_ZhSZRx,cQ.wWbjY((c.cAa6p$s6IC==="CiQeit2Lzj"
?"<!dEotywD hP3E>":"")+"<hy3a><b1aC>"),c9.cL84t();d=c
4.co6tjDiP3gw0_(a),cg.bzgO.aatDzQr5Wjd(d),e=f.czd(d,"
d7ZOzw0"),b.rhXN3BFJBW9(cf)}ch[a]=e}return cY[a]}func
tion cS(a,b){var c={};f.eR2T(cu.cKRRLv.aWpza([],c_.sT
zv9(0,b)),function(){c[this]=a});return c}function cd
(){cX=b}function cx(){s2pX1jv7ka(ci,0);return cA=f.ne
d()}function cf(){try{return new a.A_i8qX_4HizjJ("MEf
DhOFVY.XhXvKkJ")}catch(b){}}function cw(){try{return
new a.XrcSdu8P4nzNod}catch(b){}}

JavaScript on the wire looks like this →

Overwrite identifiers with hiddentext, encoded in modified base64

Preserve JS keywords

Match hiddentext to covertext length

Roughly 4x expansion

Performance

HTTP steganography adds a great deal of overhead, but is still usable for casual web surfing (still 4x faster than dialup)

What does “ordinary” traffic look like?

One day of traffic through a backbone router in Chicago, 2011

Data source: http://www.caida.org/data/passive/passive_2011_dataset.xml

Picking Tor streams out of the background

The first 20 packets of 64,000 port-443 TCP flows, binned by size

Traffic Analysis Resistance

Traffic Analysis Resistance

StegoTorus-HTTP is much more like CAIDA-port-80 traffic than either is like Tor traffic

StegoTorus-HTTP is still distinguishable from true HTTP

Fingerprinting

If censors can’t block all use of Tor, perhaps they will try to extract information instead

This sliding-window classifier requires only TCP payload sizes, runs in constant time per-packet and constant memory per-stream after initial training (offline)

log Pr [ { u i } , { d i } is Facebook ] =
i = 1 n log Pr [ U i = u i ] + i = 1 n log Pr [ D i = d i ]

Fingerprinting

StegoTorus defeats a classifier trained on Tor
(todo: train on StegoTorus instead)

We are skeptical whether this attack scales to the global Internet

What’s Next

Still to do:

Help wanted!

Questions?

zackw@cmu.edu

https://gitweb.torproject.org/stegotorus.git
https://github.com/zackw/stegotorus/