This code scrapes all of the RSS feeds for submissions on the EPA Ireland website once a day. It then generates a single small RSS feed with all of the previous day's updates. Finally it generates daily CSV files with the same data.
It all runs under GitHub Actions around 01:30AM UTC every day and takes about 20 minutes to complete due to the number of RSS feeds that need to be downloaded and parsed.
Use this URL in Feedly or similar: https://raw.githubusercontent.com/EPA-Ireland-Updates-Unofficial/epa-rss/main/output/daily.xml
They are all here in the repo starting on Sep 22nd 2022: https://github.com/EPA-Ireland-Updates-Unofficial/epa-rss/tree/main/output/csv/daily
If you'd like to receive email with a link to the latest CSV each day:
- Create a GitHub Account
- Click the drop-down menu beside "Watch" in the top right of this project's page.
- Select "Custom" and tick the box beside "Issues". Then click Apply.
- You should start receiving the emails beginning tomorrow.
The latest full set of scraped data is available as a SQLite DB that you can download here. Use something like SQLiteStudio to browse and query it.
Alternatively you can use a very cool project by Simon Willison called Datasette Lite to browse and query all the latest data in your browser by going here.
LICENSE Apache-2.0
Copyright Conor O'Neill 2022, [email protected]