Browser automation for Chameleon.
- Install Chromium, chromedriver, python3 and xvfb. On Ubuntu:
sudo apt-get install chromium-browser chromium-chromedriver python3 xvfb
-
Install the project's Python dependencies (documented in requirements.txt). You might do this with
virtualenv
andpip
, or maybe Docker. Note this is a Python 3 project. -
Make sure
chromedriver
is in your $PATH. It's not on Ubuntu, so we have to fix that:
sudo ln -s /usr/lib/chromium-browser/chromedriver /usr/local/bin/chromedriver
- If using Ubuntu 14.04, fix chromedriver's shared libraries error:
echo "/usr/lib/chromium-browser/libs" | sudo tee --append /etc/ld.so.conf.d/chrome_lib.conf >/dev/null
sudo ldconfig
- Finally, generate a Chameleon CRX package by following development setup steps 1 and 4 in Chameleon's checkout.
Run ./crawl.py /path/to/chameleon.crx
to perform a crawl, or ./crawl.py -h
to see the optional arguments:
usage: crawl.py [-h] [--headless | --no-headless] [-n {1,2,3,4,5,6,7,8}] [-q]
[-t SECONDS] [--urls URL_FILE_PATH]
CHAMELEON_CRX_FILE_PATH
positional arguments:
CHAMELEON_CRX_FILE_PATH
path to Chameleon CRX package
optional arguments:
-h, --help show this help message and exit
--headless use a virtual display (default)
--no-headless
-n {1,2,3,4,5,6,7,8} how many browsers to use in parallel (default: 4)
-q, --quiet turn off standard output
-t SECONDS, --timeout SECONDS
how many seconds to wait for pages to finish loading
before timing out (default: 20)
--urls URL_FILE_PATH path to URL list file (default: urls.txt)
Run ./view.py
and visit the displayed URL to review crawl results.
- Crawl Alexa Global Top 1,000,000 Sites: http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
- Analyze results:
- Discover fingerprinters
- Confirm detection of known fingerprinters
- Tweak the heuristic to minimize false negatives/positives.
- Create minisite to chart (the growth of?) fingerprinting across the Web.
Mozilla Public License Version 2.0