Skip to content

Latest commit

 

History

History
45 lines (27 loc) · 1.83 KB

README.md

File metadata and controls

45 lines (27 loc) · 1.83 KB

OpenElections Leaflet Scraper and Parser

Project Logo

Description

This repository contains code and data for accomplishing the following:

  1. Scraping campaign leaflets data from the Open Elections leaflet archive.
  2. Sending the images for each leaflet to OpenAI's GPT4-Vision (via an API) in order to parse it into a JSON structure.
    • The JSON is structured to take the images of the leaflets and put them into an interpretable structure containing information the candidate's name, their key policies, the content they mention regarding key issues, contact details and more.
  3. Cleaning and verifying the JSON files obtained from OpenAI's API.

This work builds on the Open Elections leaflet archive (Milazzo, C., Trumm, S., Townsley, J. 2020. OpenElections Leaflet Data, 2010-2019. Nottingham, UK.) which in turn builds on data gathered through Democracy Club.

This project was put together for Campaign Lab as part of one of their Winter Hack Nights fairly quickly using some guess-work and trial-and-error combined with input from ChatGPT. All faults mine.

Table of Contents

Installation

Each of the subdirectories contain requirements.txt files for creating Python environments which will enable running the code for each part of the project.

Usage

More details to come soon!

Contributing

More details to come soon!

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

You can reach me on Twitter: @thicknavyrain