For our work on the effect of DNS on Tor’s anonymity we collected a significantly sized DNS dataset with five samples each from the Alexa top one million most popular websites on the Internet on April 15th 2016. We described the data collection in a prior post. Let’s look at some fun statistics from the dataset.

Domains and requests

To begin with, we parsed 1,000,000 sites with 5 samples each, with a total of 60,828,453 DNS requests and 2,540,941 domains. The number of DNS requests sent per site had a mean of 12.2, std 11.2, median 10, min 1, and max 397. We found 2,260,534 unique domains—that is, domains that was only resolved on one site—with a per site mean 2.3, std 1.8, median 2, min 0, and max 363. In total, there are 968,491 sites with unique domains (96.8% of all sites). If we look at the fraction of sites with unique domains for different Alexa ranks, we see that the more popular a site is the less likely it is that it has a unique domain, as shown in the figure below from our Tor DNS paper.

dns-unique

TTLs

Shifting focus to the TTLs of the DNS records, we got a mean 9780.0, std 42930.5, median 255, min 0, and max 604800 for all DNS records (values in seconds). If we look at only the TTLs of the primary domains—that is, the domain of the site we visit on the Alexa list—we see a mean 12307.5, std 28959.4, median 2501, min 0.0, and max 604800. The TTLs are significantly longer for primary domains. For unique domains, we got a mean 12393.2, std 30081.5, median 1800.0, min 0.0, and max 604800.0. This is similar to primary domains. Finally, for each site with unique domains, if we look at the unique domain with the minimum TTL, we get mean 3833.9, std 11073.6, median 60.0, min 0.0, and max 604800.0. The median is scary low, implying that for close to half of Alexa top one million there is at least one unique domain with TTL at or below 60 seconds.

Scary big players

Want to browse the Internet without Google finding out? I got bad news for you. If we look at the top 5 most frequently requested domains in our dataset:

the 5 most frequently requested domains
 	 1:	 750148	 clients1.google.com (TTL mean 139.0, std 92.4, median 138.0, min 0.0, max 300.0)
 	 2:	 746129	 clients.l.google.com (TTL mean 150.1, std 86.6, median 150.0, min 0.0, max 300.0)
 	 3:	 526575	 www.google-analytics.com (TTL mean 36134.3, std 24911.3, median 34835.0, min 0.0, max 86400.0)
 	 4:	 516413	 www-google-analytics.l.google.com (TTL mean 150.4, std 86.6, median 151.0, min 0.0, max 300.0)
 	 5:	 358520	 fonts.googleapis.com (TTL mean 1689.0, std 1095.4, median 1684.0, min 0.0, max 3600.0)
the top 5 domains have 2,897,785 requests (4.76% of total)

Turns out clients1.google.com is called on at about 75% of Alexa top one-million. If we make a keyword-based search for a number of big players, we find the following:

Amazon stats, keywords [amazon aws s3 cloudfront ec2]
 	found on 212141 sites (21.21% of all sites)
 	63721 unique domains with 556712 requests (0.92% of total)
 	TTL mean 1915.7, std 23415.0, median 60.0, min 0.0, max 604800.0

Google stats, keywords [google doubleclick gstatic android.com 2mdn.net cc-dt.com gvt1.com gvt2.com urchin.com youtube-nocookie.com youtube.com youtubeeducation.com ytimg.com g.co goo.gl]
 	found on 774336 sites (77.43% of all sites)
 	64080 unique domains with 7481962 requests (12.30% of total)
 	TTL mean 10866.0, std 49164.7, median 194.0, min 0.0, max 604800.0

Facebook stats, keywords [facebook fbcdn]
	found on 237392 sites (23.74% of all sites)
	689 unique domains with 938144 requests (1.54% of total)
	TTL mean 941.8, std 1340.5, median 99.0, min 0.0, max 172817.0

Akamai stats, keywords [akamai edgesuite edgekey srip akadns]
	found on 338606 sites (33.86% of all sites)
	23220 unique domains with 1565730 requests (2.57% of total)
	TTL mean 3195.5, std 7329.4, median 20.0, min 0.0, max 604800.0

Cloudflare stats, keywords [cloudflare]
 	found on 110607 sites (11.06% of all sites)
 	6050 unique domains with 137647 requests (0.23% of total)
 	TTL mean 165.9, std 353.2, median 165.0, min 0.0, max 86400.0

Big players all over the place. Cloudflare is actually bigger than it looks. If we look at the IP-addresses returned when resolving domains and we compare it to the reported IP-addresses of Cloudflare at the time, we find that 64,388 of primary sites (6.44% of all sites) uses Cloudflare directly, and at least one domain on a site uses Cloudflare at 258,082 sites (25.81% of all sites). Fun times.

Wrapping up

The stats was generated using the dnsstats tool at github.com/pylls/defector/cmd/dnsstats. You can find links to the data in our prior post that documents the collection process.