Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save datapackage #37

Merged
merged 9 commits into from
Jul 10, 2023
Merged

Save datapackage #37

merged 9 commits into from
Jul 10, 2023

Conversation

FelixMau
Copy link
Contributor

@FelixMau FelixMau commented Jul 3, 2023

Saving Datapackage

For Building the Package datapackage.Package is used to infer datatypes.

  1. Save elements and sequences with defined Naming
  2. Build Package
  3. Add Foreign keys to descriptors
  4. Use updated descriptors to create new Package and save that package as json.

Then Foreign keys are added and the datapackage.json is saved to destination

Unrelated changes:

Adjusted Naming of foreign_keys to be the same as in tabular, as they are named foreignKeys
Changed get_foreign_keys: the bus names where actually the entries that would go into bus.csv and not the correct fields.

Adjusted Naming of foreign_keys to be the same as in tabular, as they are named "foreignKeys"
Changed get_foreign_keys, the bus names where actually the entries that would go into "bus.csv" and not the correct fields.

For Building the Package datapackage.Package is used to infer datatypes.
1. Save elements and sequences with defined Naming
2. Build Package
3. Add Foreign keys to descriptors
4. Use updated descriptors to create new Package and save that package as json.

Then Foreign keys are added and the
@FelixMau
Copy link
Contributor Author

FelixMau commented Jul 3, 2023

After updating tests and testing #31 and #22 may be closed.
I would like to work on the test on branch write datapackage json directly and review/merge before

@FelixMau FelixMau requested review from henhuy and nailend July 5, 2023 13:47
@henhuy
Copy link
Contributor

henhuy commented Jul 6, 2023

I don't have much time currently... @nailend could you check? sorry

Copy link
Contributor

@nailend nailend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functionality to write the csv files seems to work fine. I also like the mock package, didn't know about this. Thanks a lot. I got a few remarks and questions. What about checking the json? This seems to be the neck breaker, and why did you decide to use the frictionless infer method?

data_adapter_oemof/build_datapackage.py Outdated Show resolved Hide resolved
data_adapter_oemof/build_datapackage.py Outdated Show resolved Hide resolved
tests/test_build_datapackage.py Outdated Show resolved Hide resolved
Comment on lines +178 to +182
check_if_csv_dirs_equal(
"_files/build_datapackage_goal", "_files/build_datapackage_test"
)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about checking the json?

Copy link
Contributor Author

@FelixMau FelixMau Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is still missing, thank you! Will add once a propper goal datapackage is established

Comment on lines 199 to 206
package.infer(pattern="**/*.csv")

# Add foreign keys from self to Package
for i, resource in enumerate(package.descriptor["resources"]):
if resource["name"] in self.foreignKeys.keys():
resource["schema"].update(
{"foreignKeys": self.foreignKeys[resource["name"]]}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you use oemof.tabular.datapackage.building.infer_metadata ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation of infer metadata is only allowing for foreign keys with names "bus", "from_bus", "to_bus" and "profile" and might be sufficient but since we tried to keep it more open and implemented a sophisticated foreign keys approach this would reduce functionality a lot as Far as I understood.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oemof.tabular.datapackage.building.infer_metadata functionality also provides a different more general approach to derive foreign keys that I think is not strict enough for our Project, what do you think?

@nailend
Copy link
Contributor

nailend commented Jul 6, 2023

After updating tests and testing #31 and #22 may be closed. I would like to work on the test on branch write datapackage json directly and review/merge before

I am a bit confused. This PR is going to close #22, right? Why do you want to merge it into add_get_foreign_keys or is this just to reduce the number of current changes, as long as its parent is not merged yet? Further work concerning the datapackage.json will be done on write datapackage json which is the associated branch for #31 and child of save_datapackage?

@nailend nailend mentioned this pull request Jul 6, 2023
@nailend nailend changed the base branch from add_get_foreign_keys to dev July 10, 2023 08:44
@nailend nailend changed the base branch from dev to 31-write-datapackagejson July 10, 2023 08:47
@FelixMau FelixMau linked an issue Jul 10, 2023 that may be closed by this pull request
7 tasks
@FelixMau FelixMau merged commit 859647e into 31-write-datapackagejson Jul 10, 2023
@henhuy henhuy deleted the save_datapackage branch October 6, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write Datapackage.json
3 participants