Congressional Crypto Disclosure Analysis

A system for scraping, downloading, and analyzing cryptocurrency holdings disclosed in Congressional financial disclosures.

Overview

This toolkit scrapes financial disclosures from both House and Senate websites, downloads the documents, extracts text content, and analyzes them for cryptocurrency holdings using GPT-4o.

Installation

Clone the repository
Install dependencies:

pip install -r requirements.txt
playwright install chromium

Install Tesseract OCR:

Linux: sudo apt-get install tesseract-ocr
Mac: brew install tesseract
Windows: Download installer from GitHub

Set up environment variables in .env:

CONGRESS_API_KEY=your-congress-api-key
OPENAI_API_KEY=your-openai-api-key

Usage

Run the complete pipeline:

python main.py

This will sequentially:

Scrape House disclosures
Analyze House documents for crypto holdings
Scrape Senate disclosures
Analyze Senate documents for crypto holdings

Results are saved to house_disclosures_analyzed.csv and senate_disclosures_analyzed.csv

Example Detection

Here's an example of a detected crypto holding from Rep. Mike Collins' Annual Disclosure:

{
    "found": true,
    "assets": [
        "Velodrome"
    ],
    "quotes": [
        "Velodrome  [CT] S 06/24/2024 06/24/2024 $1,001 - $15,000"
    ]
}

Project Structure

src/
  ├── house_scrape.py      # House disclosure website scraper
  ├── house_analysis.py    # House document analyzer
  ├── senate_scrape.py     # Senate disclosure website scraper
  ├── senate_analysis.py   # Senate document analyzer
  └── config.py           # Crypto asset configurations
main.py                  # Entry point
.env                     # Environment variables

Technical Details

Uses Playwright for web scraping
PyPDF2 with Tesseract OCR fallback for text extraction
GPT-4o for crypto asset detection
Includes retry logic and rate limiting
Comprehensive logging to console (INFO) and logs.txt (DEBUG)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SUBMISSION_README.md

SUBMISSION_README.md

Congressional Crypto Disclosure Analysis

Overview

Installation

Usage

Example Detection

Project Structure

Technical Details

Files

SUBMISSION_README.md

Latest commit

History

SUBMISSION_README.md

File metadata and controls

Congressional Crypto Disclosure Analysis

Overview

Installation

Usage

Example Detection

Project Structure

Technical Details