Skip to content

Commit

Permalink
Add import_all pipeline (#75)
Browse files Browse the repository at this point in the history
* add import_all pipeline

* linting and run script

* lint

* lint

* lint please

* explicit schema, async pipeline

* lint

* lint

* avoid AttributeError

* partition requests

* bonus closure

* revert

* fix UnicodeEncodeError maybe?

* partition pages

* fix UnicodeEncodeError and better logging

* handle RecursionError

* Add ability to read manifest file

* adds the `--input_file` parameter

* Add `Reshuffle()` after reading the input file

* Allow blank values for URL while testing

* Apply suggestions from code review

* Update modules/import_all.py

Co-authored-by: Giancarlo Faranda <[email protected]>
  • Loading branch information
rviscomi and giancarloaf authored Jun 30, 2022
1 parent 694a507 commit 28aa69c
Show file tree
Hide file tree
Showing 6 changed files with 787 additions and 0 deletions.
10 changes: 10 additions & 0 deletions modules/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,16 @@
pkg_resources.read_text("schema", "summary_requests.json")
)
},
"all_pages": {
"fields": json.loads(
pkg_resources.read_text("schema", "all_pages.json")
)
},
"all_requests": {
"fields": json.loads(
pkg_resources.read_text("schema", "all_requests.json")
)
},
},
}

Expand Down
Loading

0 comments on commit 28aa69c

Please sign in to comment.