For our work on the effect of DNS on Tor’s anonymity we collected a significantly sized DNS dataset with five samples each from the Alexa top one million most popular websites on the Internet on April 15th 2016. We described the data collection in a prior post. Let’s look at some fun statistics from the dataset.
Domains and requests
To begin with, we parsed 1,000,000 sites
with 5 samples each, with a total of
60,828,453 DNS requests
and 2,540,941
domains. The number of DNS requests
sent per site had a mean of 12.2, std 11.2, median 10, min 1, and max 397
.
We found 2,260,534 unique domains
—that is, domains that was only resolved on
one site—with a per site mean 2.3, std 1.8, median 2, min 0, and max 363
.
In total, there are 968,491 sites with unique domains
(96.8% of all sites). If
we look at the fraction of sites with unique domains for different Alexa ranks,
we see that the more popular a site is the less likely it is that it has a
unique domain, as shown in the figure below from our Tor DNS paper.
TTLs
Shifting focus to the TTLs of the DNS records, we got a mean 9780.0, std
42930.5, median 255, min 0, and max 604800
for all DNS records (values in seconds).
If we look at only the TTLs of the primary domains—that is, the domain of the
site we visit on the Alexa list—we see a mean 12307.5, std 28959.4, median
2501, min 0.0, and max 604800
. The TTLs are significantly longer for primary
domains. For unique domains, we got a mean 12393.2, std 30081.5, median 1800.0, min 0.0, and max 604800.0
. This is similar to primary domains.
Finally, for each site with unique domains, if we look at the unique domain with the minimum TTL, we get mean 3833.9, std 11073.6, median 60.0, min 0.0, and max 604800.0
. The median is scary low, implying that for close to half of
Alexa top one million there is at least one unique domain with TTL at or below
60 seconds.
Scary big players
Want to browse the Internet without Google finding out? I got bad news for you. If we look at the top 5 most frequently requested domains in our dataset:
Turns out clients1.google.com is called on at about 75% of Alexa top one-million. If we make a keyword-based search for a number of big players, we find the following:
Big players all over the place. Cloudflare is actually bigger than it looks.
If we look at the IP-addresses returned when resolving domains and we compare
it to the reported IP-addresses of Cloudflare at the time, we find that
64,388 of primary sites
(6.44% of all sites) uses Cloudflare directly,
and at least one domain on a site uses Cloudflare at
258,082 sites
(25.81% of all sites). Fun times.
Wrapping up
The stats was generated using the dnsstats
tool at github.com/pylls/defector/cmd/dnsstats. You can find links to the data in our prior post that documents the collection process.