Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

London assembly scraper #167

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Open

London assembly scraper #167

wants to merge 12 commits into from

Conversation

ajparsons
Copy link
Contributor

This PR replaces the previous scraper to address the change in the london mayor/assembly website mysociety/theyworkforyou#1687

This is also adding some config files for docker and code linters. Linters are restricted to the london-mayors-question folder for the moment.

The scraper talks to the london site in two places:

  • Scrapers the search to get the slugs of all questions in a time range.
  • Fetches the details from the question page.

Because we have no way of knowing which questions have answers, all questions without answers need to be re-queried for an update.

The command to do this looks something like this:

questions.py fetch-unknown-questions --last-week fetch-unstored refresh-unanswered build-xml --outdir temp/

And a version of this has replaced the commented out lines in updatedaterange-parse.

It stores intermediate files in a json_cache directory. A initial populate will need to be done to catch up:

questions.py fetch-unknown-questions 2020-12-20

There have been some updates to the overall requirements.txt - which hopefully shouldn't cause wider problems.

Import running for all info since 2020-12-20 seems to work fine in TWFY:

image

@ajparsons ajparsons requested a review from dracos April 18, 2023 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant