Ofsted Report Scraper

Download and inspect Ofsted reports for keywords. This code will:

Download a list of schools (scrape_search_pages)
Download a list of reports associated with those schools (scrape_school_pages)
Download a subset of those reports (download_report_pdfs)
Convert .pdf reports to .txt (convert_pdfs)
Parse .txt for keywords using regular expressions (scan_reports)

Installation

git clone https://github.com/jdkram/ofsted-report-scraper
cd ofsted-report-scraper
gem install bundler
bundle install

Use

Modify task.rb - specify school types, reports types etc.
Run with ruby task.rb (or caffeinate ruby task.rb to keep machine awake for long downloads).

Please note that scrape_search_pages and scrape_school_pages don't currently handle being interrupted well as they don't record their progress.

scrape_search_pages and scrape_school_pages both sleep rand(0.1..0.6) (a random time between 0.1 and 0.6 seconds) between calls to ease the request rate on their site. download_report_pdfs sleeps for a slightly longer 1-2 seconds, for no particular reason other than this tends to be a large number of consecutive requests.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Ofsted Report Scraper

Installation

Use

Files

README.md

Latest commit

History

README.md

File metadata and controls

Ofsted Report Scraper

Installation

Use