This repository contains code and data for accomplishing the following:
- Scraping campaign leaflets data from the Open Elections leaflet archive.
- Sending the images for each leaflet to OpenAI's GPT4-Vision (via an API) in order to parse it into a JSON structure.
- The JSON is structured to take the images of the leaflets and put them into an interpretable structure containing information the candidate's name, their key policies, the content they mention regarding key issues, contact details and more.
- Cleaning and verifying the JSON files obtained from OpenAI's API.
This work builds on the Open Elections leaflet archive (Milazzo, C., Trumm, S., Townsley, J. 2020. OpenElections Leaflet Data, 2010-2019. Nottingham, UK.) which in turn builds on data gathered through Democracy Club.
This project was put together for Campaign Lab as part of one of their Winter Hack Nights fairly quickly using some guess-work and trial-and-error combined with input from ChatGPT. All faults mine.
Each of the subdirectories contain requirements.txt
files for creating Python environments which will enable running the code for each part of the project.
More details to come soon!
More details to come soon!
This project is licensed under the MIT License - see the LICENSE file for details.
You can reach me on Twitter: @thicknavyrain