Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

(New Office) Budget Office (Phase 1): create scraper for this office #159

Open
6 tasks
higorspinto opened this issue May 26, 2020 · 0 comments
Open
6 tasks

Comments

@higorspinto
Copy link
Contributor

higorspinto commented May 26, 2020

Description

The Budget Office is among the list of new offices whose datasets need to be ingested into the data portal. For this to happen, we need to create a new scraper to crawl/parse the available webpages of the office.

https://www2.ed.gov/about/overview/budget/index.html

Acceptance Criteria

  • We have a functional crawler that crawls through the webpages of the offices
  • We have a functional parser that understands the page structures and generates structured data
  • Datasets are produced when the scraper is run

Tasks

  • Identify the possible page structures in the target site
  • Write one or multiple parsers that cover as many cases as possible
  • Test if it runs well within the pipeline

Jira Card

@higorspinto higorspinto changed the title (New Office) Office of the General Counsel: create scraper for this office (New Office) Office of the General Counsel: create scraper for this office (Phase 1) May 27, 2020
@osahon-okungbowa osahon-okungbowa changed the title (New Office) Office of the General Counsel: create scraper for this office (Phase 1) Budget Office (New Office): create scraper for this office May 27, 2020
@osahon-okungbowa osahon-okungbowa changed the title Budget Office (New Office): create scraper for this office (New Office) Budget Office (Phase 1): create scraper for this office May 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant