For our work on the effect of DNS on Tor’s anonymity we collected a significantly sized DNS dataset with five samples each from the Alexa top one million most popular websites on the Internet on April 15th 2016. The data was collected with Tor Browser 5.5.4 using tools from the DefecTor toolset.

Download the data

We make the following files available:

  • PCAPs: alexa1mx5.tar.gz (7.4 GiB)
    SHA-256: 100b2081ca194571206ba02d88459982baf7b0584b3dd3246c0c0413048ddb5e
  • Extracted textfiles: alexa1mx5-extracted.tar.gz (590 MiB)
    SHA-256: 7361a816f24b34b1f8d9f26e9fa5a403622ce3b4b401a101f4b41cf1d6705ffc
  • Alexa top 1,000,000 file: top-1m.csv (22 MiB)
    SHA-256: 65f8d31a61164825900d50296de35bfbeaac405c9227abf5680ff61c404aa933
  • IPv4 addresses for Cloudflare: ips-v4 (0.2 KiB)
    SHA-256: 3a69b705b18bd630e748165183a8158220b755fa9026b7db967cd9769410e606

How the data was collected

Our collection method uses a fresh copy of Tor Browser for each site visit without using tor. In other words, we configured Tor Browser to not use the Tor network but instead connect directly from our university network. We did this to avoid issues like Cloudflare CAPTCHAs and IP-blacklists containing exits from the Tor network. Please note that the data was collected with Tor Browser 5.5.4, newer versions might require further modifications to, e.g., prevent unwanted network traffic or even to run Tor Browser in a container.

First, install the relevant tools using Go:

go get github.com/pylls/defector/cmd/{server,tbdnsw}
go get github.com/pylls/defector/cmd/extractdns

Download an Alexa file with top sites and run:

server -f data -s 5 -t 30 -o .pcap top-1m.csv

The server will instruct workers to collect in total five samples of the sites in top-1m.csv, using up to 30 seconds per site visit, and store the results in the data folder with the suffix .pcap. By default, the server listens on port 55555 on all interfaces.

Download a fresh copy of Tor Browser and extract it. Open Browser/TorBrowser/Data/Browser/profile.default/preferences/ and put the following at the bottom of extension-overrides.js:

user_pref("app.update.enabled", false);
user_pref("extensions.torlauncher.prompt_at_startup", false);
user_pref("extensions.torlauncher.start_tor", false);
user_pref("datareporting.healthreport.nextDataSubmissionTime", "1559373924100");
user_pref("datareporting.policy.firstRunTime", "1559287524100");
user_pref("extensions.torbutton.lastUpdateCheck", "1559287542.7");
user_pref("extensions.torbutton.show_slider_notification", false);
user_pref("extensions.torbutton.updateNeeded", false);
user_pref("extensions.torbutton.versioncheck_url", "");
user_pref("extensions.torbutton.versioncheck_enabled", false);
user_pref("network.proxy.proxy_over_tls", false);
user_pref("network.proxy.socks", "");
user_pref("network.proxy.socks_port", 0);
user_pref("network.proxy.socks_remote_dns", false);

Launch Tor Browser and follow this guide from Mozilla. Nex, download the latest release of dumb-init. We need a minimal init system to clean up the many processes we will be creating in Docker. Copy the following into a new file named Dockerfile (based on work by Jess Frazelle):

FROM debian:jessie
MAINTAINER Tobias Pulls <tobias.pulls@kau.se>

RUN apt-get update && apt-get install -y \
xvfb \
libpcap-dev \
libasound2 \
libdbus-glib-1-2 \
libgtk2.0-0 \
libxrender1 \
libxt6 \
xz-utils \

xauth \
psmisc \
--no-install-recommends


COPY dumb-init*_amd64.deb /
RUN dpkg -i dumb-init*.deb
RUN rm dumb-init*.deb && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

ENV HOME /home/user
ENV LANG C.UTF-8

# create user (start-tor-browser.sh prevents us from running as root)
RUN useradd --create-home --home-dir $HOME user

COPY tbdnsw $HOME/
COPY tor-browser_en-US $HOME/tor-browser_en-US

RUN chown -R user:user $HOME \


&& chmod +x $HOME/tbdnsw \
&& setcap 'CAP_NET_RAW+eip CAP_NET_ADMIN+eip' $HOME/tbdnsw

WORKDIR $HOME
USER user
ENTRYPOINT ["dumb-init", "--"]

Build the docker container and start a worker:

docker build -t pulls/worker  .
docker run --privileged -d pulls/worker ./tbdnsw <IP:port>

Finally, to extract the DNS data from the resulting pcaps use the extractdns tool:

extractdns -o results/ data/

Where data is the folder the server stored the data in and results is the folder to store the extracted data in.

Wrapping up

The data collection for our DNS data largely mirrors how we created the DefecTor WF dataset.
Note that when we collected this dataset, we ran the server in five rounds, increasing the number of samples by one for each run starting from 1 sample. This way we made sure that we spread-out our site visits to the same time.