Skip to content
This repository has been archived by the owner on Sep 20, 2024. It is now read-only.

Improve scraper for White House Initiatives (Phase 2) #166

Open
5 tasks
higorspinto opened this issue May 27, 2020 · 0 comments
Open
5 tasks

Improve scraper for White House Initiatives (Phase 2) #166

higorspinto opened this issue May 27, 2020 · 0 comments
Assignees

Comments

@higorspinto
Copy link
Contributor

higorspinto commented May 27, 2020

During phase 1 we created a functional scraper for crawling and parsing data from this office. The scraped data was successfully ingested into the data portal.

For phase 2, we need to improve the quality of metadata and data-content for the datasets being generated by the scraper.

https://sites.ed.gov/whiaiane/
https://sites.ed.gov/hispanic-initiative/
https://sites.ed.gov/whieeaa/
https://sites.ed.gov/whhbcu/

Acceptance Criteria

  • we have marked improvement in the quality of metadata and data-content of datasets produced by the scraper.
  • the improved quality datasets are visible on the data portal

Tasks

  • Ensure datasets produced have a description metadata
  • Ensure datasets have a publisher metadata
  • Improve other metadata (use defaults where available)

Jira Card

@higorspinto higorspinto self-assigned this May 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant