At the end of gathering data in our previous post,
we ended up with a folder of collected pcaps in the format data/<method>/
.
Each subfolder has a dataset of 100 monitored sites with 100 instances each
and 10,000 unmonitored sites. Next, we’ll use some processing and analysis
tools to dig deeper. Get the tools by:
Processing
First we need to extract cells from the pcaps. Cells are descriptions of pcaps in the following format:
For each line, the first value is the relative time between packets and the second value is the direction (positive for transmitting, negative for receiving). This is the same format as used by other website fingerprinting tools, like those of Wang et al. Note that, while called cell extraction, our basket case tools operate on the packet level. Past research shows that the difference between cells and packets are negligible on WF performance [0]. Also, if anything, operating on the packet level is more realistic.
To extract the cells from our data, use the extractcells
tool. The -o
flag
specifies the output directory, and the -bridge
flag the IP-address of the
bridge the client connected to. With the bridge’s IP-address we can filter out
other network traffic on the network that might accidentally have ended up in
the pcaps (even though we do our best to not collect this in clients). The
first argument is the folder with the pcaps.
With cells extracted the next step is attack specific. In our case, we continue
using Wa-kNN and our extractfeatures
tool has a -o
flag specifies the output directory.
Analysis
With the Wa-kNN features extracted we can move on to analysis using go-knn
.
There are three mandatory flags: -sites
, -instances
and -open
that
specify what one would expect. The first argument is also mandatory and it is
the folder with the extracted features.
Above you also see some example output to stdout. go-knn
calculates weights
in parallel, reports progress to stdout, and uses as all CPU cores for testing
(what usually takes the longest on bigger datasets). Beyond printing the final
results to stdout (not shown above), four files are written to the working
directory:
100x100+10000-precision.csv
a CSV file of the precision of all subfolders (methods) for different k-values in Wa-kNN.100x100+10000-recall.csv
a CSV file of the recall of all subfolders (methods) for different k-values in Wa-kNN.100x100+10000.log
a complete output log (transcript) of the analysis.100x100+10000.weights
the calculated weights with WLLCC of Wa-kNN for all subfolders and folds.
The names of the files are derived from the flags to go-knn
. The log file
also contains a detailed breakdon of the different classification breakdowns
we covered in an earlier update.