Background: This scraper written for #openscraperchallenge 2015 extracts data from the Financial Administration of the Slovak Republic, namely beneficiaries of the tax share (non-profit organizations) and converts them from PDF to CSV. Historical records and XLS files are downloaded from a third-party website (rozhodni.sk).
Requirements: Make sure you have Vagrant
and VirtualBox
installed before proceeding.
How to run: Clone this repository, CD to the directory containing the Vagrantfile and execute vagrant up
. A virtual machine will be provisioned and data scraped. The output files will be saved in the data
subdirectory. Source files (PDF) and intermediate files (*-raw.csv) are kept for debugging purposes.
How to clean up: Just run vagrant destroy
. This will only delete the virtual machine, your scraped data will remain available in the cloned repository.