Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lighten the Load: A new data store for eLCI #272

Open
dt-woods opened this issue Dec 2, 2024 · 6 comments
Open

Lighten the Load: A new data store for eLCI #272

dt-woods opened this issue Dec 2, 2024 · 6 comments
Labels

Comments

@dt-woods
Copy link
Collaborator

dt-woods commented Dec 2, 2024

It goes without saying that the ElectricityLCI repository is getting big (>250 MB). This slows down pulls/pushes and makes the repository less user-friendly. The reliance on the repository for storing data (e.g., public life cycle inventories), intermediate data files (e.g., StEWI FRS bridge files), and templates, is not sustainable as the repository is only expected to grow, as new LCIs become available (e.g., 2020 natural gas and new nuclear).

A better handling of large data files (e.g., >100 kB) would be to host them on a public data warehouse, where their metadata, version/history, citation, and document summaries can be expressed for traceability and transparency, which are both lacking/difficult in the current organization of the repo.

@dt-woods dt-woods mentioned this issue Dec 2, 2024
41 tasks
@dt-woods dt-woods added the data label Dec 2, 2024
@dt-woods
Copy link
Collaborator Author

dt-woods commented Dec 2, 2024

@dt-woods
Copy link
Collaborator Author

dt-woods commented Dec 2, 2024

Is this template file still valid?

@bl-young
Copy link
Collaborator

bl-young commented Dec 3, 2024

Is this template file still valid?

No that is not in use.

@bl-young
Copy link
Collaborator

bl-young commented Dec 4, 2024

I do agree that hosting files of any significant size should happen externally. I've also wondered if it would make sense to split out the upstream data and related processing into a separate package, like ELCI_upstream or something to streamline/simplify things a bit.

Unfortunately, we've found that deleting large files doesn't remove them from the git history and so does not impact the size of the repository (an issue we've faced with fedelemflowlist). Something like this might be necessary, but I have not tried it: https://rtyley.github.io/bfg-repo-cleaner/#usage

@dt-woods
Copy link
Collaborator Author

dt-woods commented Dec 5, 2024

The quick and the easy is to delete the old repository, which has bloated, and create a nice new clean repo, which you don't dump data in to.

dt-woods added a commit to KeyLogicLCA/ElectricityLCI that referenced this issue Dec 5, 2024
dt-woods added a commit to KeyLogicLCA/ElectricityLCI that referenced this issue Dec 5, 2024
@WesIngwersen
Copy link
Collaborator

Can these StEWI FRS bridge files and stewicombo files go?

I agree with moving all these static files to a data server. We can put all stewi-based outputs on our Data Commons stewi folder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants