Ofsted Report Scraper

Download and inspect Ofsted reports for keywords. This code will:

Download a list of schools (scrape_search_pages)
Download a list of reports associated with those schools (scrape_school_pages)
Download a subset of those reports (download_report_pdfs)
Convert .pdf reports to .txt (convert_pdfs)
Parse .txt for keywords using regular expressions (scan_reports)

Installation

git clone https://github.com/jdkram/ofsted-report-scraper
cd ofsted-report-scraper
gem install bundler
bundle install

Use

Modify task.rb - specify school types, reports types etc.
Run with ruby task.rb (or caffeinate ruby task.rb to keep machine awake for long downloads).

Please note that scrape_search_pages and scrape_school_pages don't currently handle being interrupted well as they don't record their progress.

scrape_search_pages and scrape_school_pages both sleep rand(0.1..0.6) (a random time between 0.1 and 0.6 seconds) between calls to ease the request rate on their site. download_report_pdfs sleeps for a slightly longer 1-2 seconds, for no particular reason other than this tends to be a large number of consecutive requests.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
output		output
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
LICENSE		LICENSE
README.md		README.md
ofsted-report-scraper.rb		ofsted-report-scraper.rb
task.rb		task.rb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ofsted Report Scraper

Installation

Use

About

Releases

Packages

Languages

License

jdkram/ofsted-report-scraper

Folders and files

Latest commit

History

Repository files navigation

Ofsted Report Scraper

Installation

Use

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages