We’ve updated our data gathering tools to be tailored for gathering datasets with pluggable transports for Tor. While the steps below will be specific to basket2, it should be self-evident how to modify them for any other pluggable transport supported by Tor. We use primarily Go for our tools and Docker containers for ease of deployment.
First, download our repository using Go:
go get github.com/pylls/thebasketcase
The repository contains Go executables and Docker images we’ll need. We’re going to run a basket2 bridge, then prepare Tor Browser, launch a gatherserver that collects our dataset, and finally run serveral gatherclients that executes Tor Browser, connects to our basket2 bridge, visits sites provided by the gatherserver and then sends the results back. Let’s get started!
Run a basket2 bridge
The basket2 bridge needs to run on a machine reachable by the gatherclients we’ll Get and build our basket2 fork:
go get github.com/pylls/basket2proxy
In a terminal, navigate to the basket2 Docker image folder at
thebasketcase/docker/basket2
and copy the basket2proxy executable here
(created by go get github.com/pylls/basket2proxy
and placed in your
$GOPATH/bin
folder). Next, build the Docker image:
docker build -t pulls/basket2bridge .
Run a container with the bridge, spawning an interactive shell:
docker run -p <IP>:<Port>:11111 -i -t pulls/basket2bridge bash
In the container, start Tor:
root@d07b96f4057c:/# service tor start
[ ok ] Starting tor daemon...done.
root@d07b96f4057c:/# cd /var/lib/tor/
root@85252b2e7a49:/var/lib/tor# cat fingerprint
Unnamed 5DD80B4AC2F718F1D8CACDAD1FD88644950A52B6
root@85252b2e7a49:/var/lib/tor# cat pt_state/basket2_bridgeline.txt
# basket2 torrc client bridge line
#
# This file is an automatically generated bridge line based on
# the current basket2proxy configuration. EDITING IT WILL HAVE
# NO EFFECT.
#
# Before distributing this Bridge, edit the placeholder fields
# to contain the actual values:
# <IP ADDRESS> - The public IP address of your obfs4 bridge.
# <PORT> - The TCP/IP port of your obfs4 bridge.
# <FINGERPRINT> - The bridge's fingerprint.
Bridge basket2 <IP ADDRESS>:<PORT> <FINGERPRINT> basket2params=0:0001:QiNZ5eqnrzPOXv4NyQ3Og5UntIpClPX6GC4c4Cq/I0Y
Assuming the server IP and port above was 192.168.60.184:11111, and the fingerprint and basket2_bridgeline as shown above, the following is the bridgeline we’ll use later to connect to the bridge:
Bridge basket2 192.168.60.184:11111 5DD80B4AC2F718F1D8CACDAD1FD88644950A52B6 basket2params=0:0001:QiNZ5eqnrzPOXv4NyQ3Og5UntIpClPX6GC4c4Cq/I0Y
Prepare Tor Browser
This is by far the most annoying step.
Follow our guide on how to
configure Tor Browser to generate less network noise. Note that versions after
Tor Browser 6.0.6 might require more work. Use the basket2 bridgeline from
above. You can find an example torrc
for basket2 if you get stuck on that
part further below.
Next, create a modified tor binary as described in our DefecTor experiments. When
creating the modified tor binary we used a Debian Jessie VM as a build machine.
Remember to copy src/or/tor
to Browser/TorBrowser/tor
. Make sure to check
that Tor Browser still works after making all the changes and replacing the
tor binary.
Run gatherserver
The gatherserver
binary should already be avilable in your $GOPATH/bin
folder. Note that all captured traffic traces will be stored in the
gatherserver so have ample diskspace available. Copy the gatherserver binary
to a working directory. Create a torrc
file with the following contents:
LogTimeGranularity 1
UseBridges 1
Bridge basket2 192.168.60.184:11111 5DD80B4AC2F718F1D8CACDAD1FD88644950A52B6 basket2params=0:0001:QiNZ5eqnrzPOXv4NyQ3Og5UntIpClPX6GC4c4Cq/I0Y
ClientTransportPlugin basket2 exec ./TorBrowser/Tor/PluggableTransports/basket2proxy -enableLogging=true -logLevel DEBUG -paddingMethods {{.Method}}
Replace the third line with your basket2 bridgeline from your basket2 server. Download a fresh Alexa top-1m file from Amazon and unzip. Run:
./gatherserver -h
To see all options. For example, if you run:
./gatherserver -monitored 100 -samples 100 -unmonitored 10000 top-1m.csv
The server will distribute to all connecting gatherclients (which we run next)
work that results in 100 monitored sites with 100 samples each from Alexa top
[1,100] and 10,000 randomly selected unmonitored sites from Alexa (100,
1000000]. The data will be collected for all default basket2 methods and the
data will be stored in data/<method>/
, with one sub-folder per method.
The server can be freely restarted and will resume based on what work is
already on disk.
Run gatherclients
In a terminal, navigate to the gatherclient Docker image folder at
thebasketcase/docker/gatherclient
. Copy the Tor Browser directory
that you modified earlier there, and the gatherclient binary from
$GOPATH/bin
. Next, build the Docker image:
docker build -t pulls/gatherclient .
If you’re lazy (like me), then you can use the run.sh
and clean.sh
scripts
to run and clean up docker containers with gatherclients in them.
In run.sh
:
#!/bin/sh
for ((n=0;n<$1;n++)) do
docker run --privileged -d pulls/gatherclient ./gatherclient <IP>:55555
done
Replace <IP>
with the IP-address to your gatherserver (that listens on port
55555). Then, to run 10 clients, just:
./run.sh 10
If you look at the terminal output of the gatherserver
you should see the
workers count increasing and eventually (after a warmup browse) data being
gathered. On failure, gatherclient
will write errors (inlcuding forwarding
errors to stdout and stderr from tor) to stdout. You can view them with
docker ps
, docker logs
, docker stats
etc. By far the most annoying
error to debug for us has been related to Tor Browser and tor not properly
starting. By default, each gatherclient
will attempt to browse to a site
five times before giving up, which delays errors in the docker log.
Summary
Once the gatherserver is done, we have 120,000 samples in total with 20,000
samples per method in six subfolders located at data/<method>/
.
The next post will cover how we use our analysis tools on the data.