Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Nightly Build Failure 2023-12-28 #3196

Closed
zaneselvans opened this issue Dec 28, 2023 · 2 comments · Fixed by #3197
Closed

Fix Nightly Build Failure 2023-12-28 #3196

zaneselvans opened this issue Dec 28, 2023 · 2 comments · Fixed by #3197
Assignees
Labels
ccai Tasks related to CCAI grant for entity matching ferc1 Anything having to do with FERC Form 1 nightly-builds Anything having to do with nightly builds or continuous deployment.

Comments

@zaneselvans
Copy link
Member

Outputs: gs://nightly-build-outputs.catalyst.coop/2023-12-28-0627-ef8ab93-dev

New failure in the integration tests when run against all years of data.

I note that there's a discrepancy between the error message and the actual check:

>       assert ratio_correct > 0.95, "Percent of correctly matched FERC records below 85%."
E       AssertionError: Percent of correctly matched FERC records below 85%.
E       assert 0.8571428571428571 > 0.95

Which were we intending to check?

__________________________ test_classify_plants_ferc1 __________________________
[gw0] linux -- Python 3.11.7 /home/mambauser/env/bin/python3.11

mock_ferc1_plants_df =      index       base_plant_name plant_type  report_year construction_type  capacity_mw  construction_year  utility_id...            NaN                    NaN                NaN                  NaN               0

[546 rows x 15 columns]

    def test_classify_plants_ferc1(mock_ferc1_plants_df):
        """Test the FERC inter-year plant linking model."""
    
        @graph
        def _link_ids(df: pd.DataFrame):
            feature_matrix = ferc_dataframe_embedder(df)
            label_df = link_ids_cross_year(df, feature_matrix)
            return label_df
    
        mock_ferc1_plants_df["plant_id_ferc1"] = (
            _link_ids.to_job()
            .execute_in_process(input_values={"df": mock_ferc1_plants_df})
            .output_value()["record_label"]
        )
    
        # Compute percent of records assigned correctly
        correctly_matched = (
            mock_ferc1_plants_df.groupby("base_plant_name")["plant_id_ferc1"]
            .apply(lambda plant_ids: plant_ids.value_counts().iloc[0])
            .sum()
        )
        ratio_correct = correctly_matched / len(mock_ferc1_plants_df)
        logger.info(f"Percent correctly matched: {ratio_correct*100:.2f}%")
>       assert ratio_correct > 0.95, "Percent of correctly matched FERC records below 85%."
E       AssertionError: Percent of correctly matched FERC records below 85%.
E       assert 0.8571428571428571 > 0.95

Also @zschira numba is putting out like 16,000 lines of debugging logs when running this test, which I don't think we want to see. How can we silence it?

@zaneselvans zaneselvans added ferc1 Anything having to do with FERC Form 1 ccai Tasks related to CCAI grant for entity matching nightly-builds Anything having to do with nightly builds or continuous deployment. labels Dec 28, 2023
@zaneselvans
Copy link
Member Author

My guess is that @zschira caught this failure and intended update both the error message and the numerical threshold, but only changed one of them. So maybe we just need to update the numerical threshold.

@zaneselvans
Copy link
Member Author

Went ahead and merged this fix in so we get another go at the nightly builds tonight...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ccai Tasks related to CCAI grant for entity matching ferc1 Anything having to do with FERC Form 1 nightly-builds Anything having to do with nightly builds or continuous deployment.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants