Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Icebox: Migrate to new Zenodo InvenioRDM API #184

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions dataset_doi.yaml
Original file line number Diff line number Diff line change
@@ -1,51 +1,51 @@
censusdp1tract:
production_doi: 10.5281/zenodo.4127048
sandbox_doi: 10.5072/zenodo.1151526
sandbox_doi: 10.5281/zenodo.123456
eia176:
production_doi: 10.5281/zenodo.7682357
sandbox_doi: 10.5072/zenodo.1168394
sandbox_doi: 10.5281/zenodo.123456
eia860:
production_doi: 10.5281/zenodo.4127026
sandbox_doi: 10.5072/zenodo.672209
sandbox_doi: 10.5281/zenodo.123456
eia860m:
production_doi: 10.5281/zenodo.4281336
sandbox_doi: 10.5072/zenodo.692654
sandbox_doi: 10.5281/zenodo.8188016
eia861:
production_doi: 10.5281/zenodo.4127028
sandbox_doi: 10.5072/zenodo.672198
sandbox_doi: 10.5281/zenodo.823125
eia923:
production_doi: 10.5281/zenodo.4127039
sandbox_doi: 10.5072/zenodo.672220
sandbox_doi: 10.5281/zenodo.8172817
eia_bulk_elec:
production_doi: 10.5281/zenodo.7067366
sandbox_doi: 10.5072/zenodo.1103571
sandbox_doi: 10.5281/zenodo.10003268
eiawater:
production_doi: 10.5281/zenodo.7683135
sandbox_doi: 10.5072/zenodo.1161347
sandbox_doi: 10.5281/zenodo.123456
epacamd_eia:
production_doi: 10.5281/zenodo.6633769
sandbox_doi: 10.5072/zenodo.1072000
sandbox_doi: 10.5281/zenodo.7900973
epacems:
production_doi: 10.5281/zenodo.4127054
sandbox_doi: 10.5072/zenodo.1228518
sandbox_doi: 10.5281/zenodo.123456
ferc1:
production_doi: 10.5281/zenodo.4127043
sandbox_doi: 10.5072/zenodo.1114564
sandbox_doi: 10.5281/zenodo.8234737
ferc2:
production_doi: 10.5281/zenodo.5879542
sandbox_doi: 10.5072/zenodo.1096046
sandbox_doi: 10.5281/zenodo.8006880
ferc6:
production_doi: 10.5281/zenodo.7126395
sandbox_doi: 10.5072/zenodo.1114637
sandbox_doi: 10.5281/zenodo.7130140
ferc60:
production_doi: 10.5281/zenodo.7126434
sandbox_doi: 10.5072/zenodo.1114668
sandbox_doi: 10.5281/zenodo.123456
ferc714:
production_doi: 10.5281/zenodo.4127100
sandbox_doi: 10.5072/zenodo.1114673
sandbox_doi: 10.5281/zenodo.7139874
mshamines:
production_doi: 10.5281/zenodo.7683517
sandbox_doi: 10.5072/zenodo.1158828
sandbox_doi: 10.5281/zenodo.7683517
phmsagas:
production_doi: 10.5281/zenodo.7683351
sandbox_doi: 10.5072/zenodo.1161321
sandbox_doi: 10.5281/zenodo.7683351
2 changes: 2 additions & 0 deletions src/pudl_archiver/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ async def archive_datasets(
summary_file: str | None = None,
download_dir: str | None = None,
auto_publish: bool = False,
refresh_metadata: bool = False,
):
"""A CLI for the PUDL Zenodo Storage system."""
if sandbox:
Expand Down Expand Up @@ -88,6 +89,7 @@ async def on_request_end(session, trace_config_ctx, params):
dry_run=dry_run,
sandbox=sandbox,
auto_publish=auto_publish,
refresh_metadata=refresh_metadata,
)

tasks.append(orchestrator.run())
Expand Down
4 changes: 2 additions & 2 deletions src/pudl_archiver/archivers/eia860.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ async def get_resources(self) -> ArchiveAwaitable:
"""Download EIA-860 resources."""
link_pattern = re.compile(r"eia860(\d{4})(ER)*.zip")
for link in await self.get_hyperlinks(BASE_URL, link_pattern):
year = link_pattern.search(link).group(1)
year = int(link_pattern.search(link).group(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of scope: we should stick mypy in the pre-commit config so these sorts of type errors don't slip through in the future. This will probably mean fixing a ton of type errors in a separate PR before we can do that.

if self.valid_year(year):
yield self.get_year_resource(link, link_pattern.search(link))
yield self.get_year_resource(link, year)

async def get_year_resource(self, link: str, year: int) -> tuple[Path, dict]:
"""Download zip file."""
Expand Down
5 changes: 5 additions & 0 deletions src/pudl_archiver/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,11 @@ def parse_main():
help="Directory to download files to. Use tmpdir if not specified.",
default=None,
)
parser.add_argument(
"--refresh-metadata",
action="store_true",
help="Regenerate metadata from PUDL data source rather than existing archived metadata.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the Zenodo metadata, like "creators", "communities", "DOI" etc?

Copy link
Member Author

@e-belfer e-belfer Nov 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, creators, title, keywords, basically all metadata. The outcome is the same as if we'd run initialize except for version and link to the pre-existing depositions. We need this because there's no other way to migrate old-type depositions, but it's also nice to have in case we ever want to update metadata for an existing archive.

)
return parser.parse_args()


Expand Down
Loading
Loading