Replies: 1 comment
-
See https://catalystcooperative.slack.com/archives/C02BSMJJVR8/p1681746492529679 for the slack conversation. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Making any change to pudl codebase may affect the outputs that are being generated by the pipeline. This change can be either intentional (e.g. modifying transform process, fixing bugs, adding new datasets), or it could be erroneous (refactoring may introduce data change that is not wanted or intended).
Many changes (esp. refactorings) should be no-op w.r.t. to the resulting data. To ensure that this is so, it would be useful to automate the output data diffing capabilities. The proposal would be this:
If there are no data diffs, there's no problem. If there are data diffs, then either the PR is explicitly marked as "expecting data changes" (e.g. this could be achieved by setting specific label on the PR) in which case, the test can still pass. If the PR introduces data changes and those are unexpected, results of the data diff can be added to the PR comments automatically so that the author can analyze and fix the problems.
Beta Was this translation helpful? Give feedback.
All reactions