Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/#450 add metadata #568

Open
wants to merge 117 commits into
base: dev
Choose a base branch
from
Open

Features/#450 add metadata #568

wants to merge 117 commits into from

Conversation

fwitte
Copy link
Member

@fwitte fwitte commented Dec 13, 2021

Contribute to #450.

Before merging into dev-branch, please make sure that

  • the CHANGELOG.rst was updated.
  • new and adjusted code is formated using black and isort.
  • the Dataset-version is updated when existing datasets are adjusted.
  • the branch was merged into the continuous-integration/run-everything-over-the-weekend- branch.
  • the workflow is running successful in test mode.
  • the workflow is running successful in Everything mode.

@fwitte
Copy link
Member Author

fwitte commented Dec 13, 2021

List of data sources

Please mark, which sources we already included @AmeliaNadal, @ClaraBuettner, @IlkaCu.

  • vg250
  • BGR saline structures
    • saline structures InSpEE
    • saline structures InSpEE-DS
    • shapefiles data-bundle (license year?)
    • BGR report (license?)
  • gas demand opendata.ffe
  • mastr
  • hotmaps_industrial_sites
  • hotmaps_scen_buildings
  • osm
  • Scigrid_gas
  • Einspeiseatlas
  • pipeline_classification
  • ...

@nesnoj
Copy link
Member

nesnoj commented Dec 13, 2021

Resolve #450.

@fwitte This does not resolve #450 but rather contributes to that issue (which is great 😸), please mention #450 but do not let github close it automatically. Thank you!

IlkaCu and others added 20 commits December 13, 2021 11:55
- Split BGR data into four individual sources (report, structural data 1
  and 2 as well as data bundle containing shapefiles)
- Add ffe data for industrial gas demand
- Add pypsa requirement
- Create a pypsa network and read the component metadata
- Add comment on egon_etrago table creation with the respective
  metadata
Similar to the sources dict in metadata.py I added a contributor
dict, so full name and github handle are easily accesible. Import
the contributors to add them to the metadata. It is possible to
update the dict according to the work done in the respective
section. Might need a .copy() to not manipulate the original dict,
I am not sure however.
@fwitte
Copy link
Member Author

fwitte commented Dec 15, 2021

I added a contributors function to the metadata.py file you can import and select the names of the contributors from.

def contributors():
return {
"am": {
"title": "Aadit Malla",
"email": "https://github.com/aadit879",
},
"an": {
"title": "Amélia Nadal",
"email": "https://github.com/AmeliaNadal",
},
"cb": {
"title": "Clara Büttner",
"email": "https://github.com/ClaraBuettner",
},
"ce": {
"title": "Carlos Epia",
"email": "https://github.com/CarlosEpia",
},
"fw": {
"title": "Francesco Witte",
"email": "https://github.com/fwitte",
},
"gp": {
"title": "Guido Pleßmann",
"email": "https://github.com/gplssm",
},
"ik": {
"title": "Ilka Cußmann",
"email": "https://github.com/IlkaCu",
},
"ja": {
"title": "Jonathan Amme",
"email": "https://github.com/nesnoj",
},
"je": {
"title": "Jane Doe",
"email": "https://github.com/JaneDoe",
},
"ke": {
"title": "Katharina Esterl",
"email": "https://github.com/KathiEsterl",
},
"sg": {
"title": "Stephan Günther",
"email": "https://github.com/gnn",
},
"um": {
"title": "Ulf Müller",
"email": "https://github.com/ulfmueller",
},
}

An example usage can be found here: See comment below

co = contributors()
contributor_list = [
co['an'].update(
{"comment": "Add CH4 storage"}
),
co['fw'].update(
{"comment": "Add H2 storage"}
),
]

Please add people I forgot in the list, I looked here for that list: https://github.com/openego/eGon-data/graphs/contributors, but the list does not seem to be complete.

@fwitte fwitte self-assigned this Dec 20, 2021
gnn and others added 17 commits March 18, 2023 16:57
There where a lot of `datetime.now()` calls in the module. But `now`
isn't a member of the `datetime` module, but a member of the `datetime`
class inside the `datetime` module.
This is why I don't like modules with members named exactly like the
module. It's an accident waiting to happen.
Insert an additional blank line below the imports and remove a blank
line at the beginning of an indented block.
Having spaces, and important separating characters in general, at the
beginning of the line makes it easier to spot if one has been
accidentally missed.
While this wasn't a problem here, in the following commit I spotted at
least four instances of a missing space between words, solely by putting
the space at the beginning of continuation lines.
Use a parenthesized expression to break long string values in
dictionaries into multiple lines, i.e. use

```python
{
    "first-key": (
        "-----------------------------------------------------"
        "\nA very long string value that has"
        " to be split over multiple lines."
    ),
    "second-key": "Did you spot the second dictionary entry?",
}
```

in favour of

```python
{
    "first-key": "-----------------------------------------------------"
    "\nA very long string value that has "
    "to be split over multiple lines.",
    "second-key": "Did you spot the second dictionary entry?",
}
```

The first version separates keys and values more clearly, making it
easier to spot all keys and to read the string value as a whole. Using
the second version, it's harder to see on which line a new entry with a
new key begins, especially if one doesn't start continuation lines with
a space or other separating character.
See the message of the previous commit for details on why to put spaces
and other important separators at the beginning of continuation lines.

Note also that, while most of the lines were within the 79 character
limit, so not too long, they where reformatted because of the reasons
above. And since they had to be touched anyways, they got wrapped at 72
characters, as that's what PEP8 suggests for free flowing text like
comments and long strings.
This is the length that PEP8 states for free flowing text like comments
or long strings. Since most strings don't start at the beginning of the
line, I employed some leeway to go a few characters over in favour of
readability and sometimes I wrapped lines early for the same reason,
mostly at sentence boundaries.
See the previous two commits for details on this particular way of
wrapping strings.
The previous URL yields a 404 error, while the new one works. At least
at the time of writing this. Noticed while trying to figure out whether
the session id suffix and the "nn=" URL parameter where really
necessary.
The `os.listdir` function can handle `Path` objects just fine.
Use this to save an indentation level. The rest of the code after the
`continue` is exactly the same as before, just indented one level less.
This isn't really important now, but since the code will change to get
rid of the test altogether, it's a precursor to not mixing getting rid
of an indentation level and other code changes into one commit later on.
It's not the table that gets "uploaded", but the metadata comment. And
things don't (always) get "uploaded" to a database, but they get stored
in the database, because the database might be on the same machine.
Last but not least, there's no reason to use use an exclamation mark in
log messages. It just makes long streams of log messages harder to
read, if they aren't terminated consistently. Log levels usually are
enough to signify the importance of message.
Prefix the results of the `os.listdir(path)` call with `path` because
the call only returns a list of the names found under `path`, without
the containing directory.
Reformat them via `python -m json.tool --indent=2`, with any Python
version >= 3.9. The files where all in one line, which isn't exactly
human readable or easy to edit.
I didnt find a proper solution to set start and end
of temporal:timeseries as its 24h timeseries
of random days, not having a specific date.
"pipeline_classification": {
"title": "Technical pipeline characteristics for high pressure pipelines",
"description": "Parameters for the classification of gas pipelines, "
"the whole documentation could is available at: "
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that "could" a quote or is that a typo that should have been something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can anyone answer this question?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for typo and not necessary.

gnn added 5 commits March 28, 2023 19:22
This doesn't change the way it's imported but now we can put data files
into the package's directory.
Get rid of the hardcoded "json_metadata" directory name and instead use
`__name__` to get the name of the current module, which now is also a
package, then use `importlib_resources.files` to get a `Path` to the
files inside the package, then use `.glob("*.json")` to get all the
files ending with ".json" below that path.

This simplifies the code because we no longer have to:

  - import `Path`, `os` or `egon.data`'s `__path__`,
  - generate the path to the files by hardcoding their directory,
  - filter for files ending in ".json" and
  - manually prefix the filenames with the path under which they where
    found.

There is one important change though: since `.glob("*.json")` returns a
generator of `Path` objects, the filename component has to be pulled out
of the `Path` via the `name` attribute before `split`ting on `"."`.

Last but not least, add a missing newline at the end of
"src/egon/data/datasets/zensus_vg250.py", courtesy of the
`end-of-file-fixer` pre-commit hook.
No need to create a new dialect instance for every file.
Comment on lines 27 to 28
"start": "2011-01-01 00:00",
"end": "2011-01-01 00:00",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these two timestamps differ? Like go from "2011-01-01 00:00" to "2011-01-02 00:00" or something like that?

Copy link
Contributor

@nailend nailend Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes thats true, should be the end of the year. Sorry about that

@khelfen khelfen mentioned this pull request Apr 21, 2023
6 tasks
@ClaraBuettner
Copy link
Contributor

What is the current status of this PR? @gnn should someone else review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants