Add naming new naming convention to docs #2874

bendnorman · 2023-09-20T19:33:12Z

PR Overview

This PR adds the new asset/table naming convention laid out in this design doc to the PUDL documents.

Changes in this PR:

Adds a note to data.catalyst.coop/pudl/ and data_access.rst that encourages users to only interact with out_ tables.
Removes portions docs/dev/data_guidelines.rst that aren't relevant anymore.
Describe the asset naming convention in docs/dev/naming_conventions.rst
Replaces ETL language in docs/intro.rst with new raw, core, output layers.

PR Checklist

Merge the most recent version of the branch you are merging into (probably dev).
All CI checks are passing. Run tests locally to debug failures
Make sure you've included good docstrings.
For major data coverage & analysis changes, run data validation tests
Include unit tests for new functions and classes.
Defensive data quality/sanity checks in analyses & data processing functions.
Update the release notes and reference reference the PR and related issues.
Do your own explanatory review of the PR to help the reviewer understand what's going on and identify issues preemptively.

…ions

codecov · 2023-09-20T20:38:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c2af359) 88.6% compared to head (53d5618) 88.6%.
Report is 7 commits behind head on rename-core-assets.

Additional details and impacted files

@@                Coverage Diff                 @@
##           rename-core-assets   #2874   +/-   ##
==================================================
  Coverage                88.6%   88.6%           
==================================================
  Files                      91      91           
  Lines                   11019   11021    +2     
==================================================
+ Hits                     9771    9773    +2     
  Misses                   1248    1248

Files	Coverage Δ
src/pudl/output/pudltabl.py	`88.7% <100.0%> (+0.2%)`	⬆️

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

aesharpe

High level comments:

Don't want to duplicate too much information between the intro page and the data naming conventions page.
I made suggestions to the naming conventions page before the introductions page, so any stuff fixed there should also be fixed in the introductions page too (or de-duplicated).
Worth thinking about the purpose of each of the docs pages. Docs are starting to get a little bit like a maze! (Not your fault or due to this PR specifically). We will probably address the bulk of this if/when we get the ComDev grant, but worth thinking about a little now. Specifically, the difference between Intro and Data Access.
Need to be careful to clarify what is in pudl.sqlite and what is not. Also center pudl.sqlite as the main resource.

aesharpe · 2023-09-21T14:52:24Z

docs/data_access.rst

+PUDL's primary data output is the ``pudl.sqlite`` database. It contains a collection
+of tables that follow :ref:`PUDL's asset naming convention <asset-naming>`. Tables
+with the ``core_`` prefix are normalized tables that serve as building blocks for the
+more denormalized and easy to work with ``output_`` tables. **We recommend only working
+with ``output_`` tables.**
+


I think we should consider the fact that many users may not know what normalized and denormalized data means in this context. It might make sense to get rid of the sentence

Tables with the ``core_`` prefix are normalized tables that serve as building blocks for the more denormalized and easy to work with ``output_`` tables.

and just say

We recommend working with tables with the ``output_`` prefix as these tables contain the most complete data. For more information about the different types of tables, read through the naming conventions.

Or something like that?

docs/dev/naming_conventions.rst

docs/intro.rst

docs/dev/naming_conventions.rst

docs/intro.rst

bendnorman · 2023-11-01T20:14:37Z

@aesharpe I incorporated most of the changes from #2912.

I feel good about the formatting and making a distinction between the data warehouse and ETL. The flow of the docs still feel a little awkward but I don't think it's due to the rename changes. I think an issue for another time.

aesharpe

Looks good! I think we are getting there. Just a few more comments about slimming stuff down and removing duplicate information.

docs/intro.rst

README.rst

docs/data_access.rst

aesharpe · 2023-11-03T19:26:36Z

README.rst

+    are stored in a data warehouse as a collection of SQLite and Parquet files so that
+    users can access the data without having to run any code. Learn more about how to
+    access the data `here <https://catalystcoop-pudl.readthedocs.io/en/dev/data_access.html>`__.
+
 What data is available?


See comment in intro page regarding data sources

README.rst

docs/intro.rst

README.rst

bendnorman · 2023-11-06T20:29:29Z

I think the root issue of our docs maze is it’s not clear where the starting point is for users and contributors. Is it:

Should the starting point be different for contributors and users?

aesharpe · 2023-11-06T20:40:52Z

I think the root issue of our docs maze is it’s not clear where the starting point is for users and contributors. Is it:

catalyst.coop/pudl,

https://data.catalyst.coop/

https://github.com/catalyst-cooperative/pudl

https://catalystcoop-pudl.readthedocs.io/en/dev/index.html#

Should the starting point be different for contributors and users?

Yeah.....this is so true! Is it possible to not have the README be the first page of the docs? That might help with the confusion if they were more seperate.

bendnorman · 2023-11-06T20:46:05Z

Yes it is possible to not include the README in the docs which might be a good option.

bendnorman · 2023-11-07T02:06:00Z

Ok! I made a few changes:

combine who is PUDL for What is PUDL in README.rst
No longer include README.rst in the docs and moved the content of intro.rst into index.rst
Move the link to the data access page to the “Available Data” section of the index.rst page

aesharpe · 2023-11-08T14:05:04Z

Hooray @bendnorman this looks great! My only suggestion is teeny tiny and it's to add a little caveat that CEMS data is stored in parquet files and not the sqlite db because it's so big.

I'm referring to the spot on the into (now index) page where it says:

PUDL’s clean and complete versions of these data sources are stored in the pudl.sqlite database and core_epacems__hourly_emissions.parquet files.

…tical

bendnorman · 2023-11-08T20:37:39Z

Wahoo! Made a change that explains larger datasets live in parquet files.

…lease-notes Add naming convention change to release notes

bendnorman added 2 commits September 20, 2023 11:03

Update contributor facing documentation with new asset naming convent…

77a16f5

…ions

Add new naming convention to user facing documentation

4d5b57d

bendnorman requested a review from aesharpe September 20, 2023 19:39

aesharpe requested changes Sep 21, 2023

View reviewed changes

Respond to first round of Austen's comments

4d256ec

bendnorman marked this pull request as draft September 26, 2023 14:20

bendnorman added 5 commits September 26, 2023 16:21

Merge branch 'rename-core-assets' into create-naming-convention-docs

ef4b5ad

Update rename-core-assets and clarify raw asset sentence

1a9028d

Restrict astroid version to avoid random autoapi error

32dc9ac

Merge branch 'rename-core-assets' into create-naming-convention-docs

797d40e

Incorporate some docs changes from #2912

33fab91

aesharpe reviewed Nov 3, 2023

View reviewed changes

Merge branch 'rename-core-assets' into create-naming-convention-docs

50e3eef

Remove README.rst from index.rst and move intro content to index

10111e4

bendnorman mentioned this pull request Nov 7, 2023

Restructure intro.rst and other pages for data warehouse #2912

Closed

Add deprecation warnings to PudlTabl and add minor naming docs updates

85c6fe3

aesharpe approved these changes Nov 8, 2023

View reviewed changes

bendnorman added 2 commits November 8, 2023 10:58

Remove PudlTabl removal data and make assn table name sources alphabe…

479ec7f

…tical

Explain why CEMS is stored as parquet

c329804

bendnorman marked this pull request as ready for review November 9, 2023 19:04

Merge pull request #3028 from catalyst-cooperative/create-renaming-re…

53d5618

…lease-notes Add naming convention change to release notes

bendnorman merged commit cb9b188 into rename-core-assets Nov 10, 2023
11 checks passed

bendnorman deleted the create-naming-convention-docs branch November 10, 2023 00:19

bendnorman mentioned this pull request Nov 15, 2023

Create documentation that explains new naming convention to users and contributors #2868

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add naming new naming convention to docs #2874

Add naming new naming convention to docs #2874

bendnorman commented Sep 20, 2023 •

edited

Loading

codecov bot commented Sep 20, 2023 •

edited

Loading

aesharpe left a comment •

edited

Loading

aesharpe Sep 21, 2023

bendnorman commented Nov 1, 2023

aesharpe left a comment

aesharpe Nov 3, 2023

bendnorman commented Nov 6, 2023

aesharpe commented Nov 6, 2023

bendnorman commented Nov 6, 2023

bendnorman commented Nov 7, 2023

aesharpe commented Nov 8, 2023

bendnorman commented Nov 8, 2023

Add naming new naming convention to docs #2874

Add naming new naming convention to docs #2874

Conversation

bendnorman commented Sep 20, 2023 • edited Loading

PR Overview

PR Checklist

codecov bot commented Sep 20, 2023 • edited Loading

Codecov Report

aesharpe left a comment • edited Loading

Choose a reason for hiding this comment

aesharpe Sep 21, 2023

Choose a reason for hiding this comment

bendnorman commented Nov 1, 2023

aesharpe left a comment

Choose a reason for hiding this comment

aesharpe Nov 3, 2023

Choose a reason for hiding this comment

bendnorman commented Nov 6, 2023

aesharpe commented Nov 6, 2023

bendnorman commented Nov 6, 2023

bendnorman commented Nov 7, 2023

aesharpe commented Nov 8, 2023

bendnorman commented Nov 8, 2023

bendnorman commented Sep 20, 2023 •

edited

Loading

codecov bot commented Sep 20, 2023 •

edited

Loading

aesharpe left a comment •

edited

Loading