Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indicator cdc vaccines in progress #1312

Open
wants to merge 98 commits into
base: main
Choose a base branch
from

Conversation

Ananya-Joshi
Copy link
Contributor

New PR for new branch of the CDC Indicator fixed. Only the committs relevant to this indicator should be in this PR.

Ananya-Joshi and others added 23 commits October 11, 2021 17:33
Co-authored-by: Katie Mazaitis <[email protected]>
Co-authored-by: Katie Mazaitis <[email protected]>
Co-authored-by: Katie Mazaitis <[email protected]>
Co-authored-by: Katie Mazaitis <[email protected]>
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits suggested; also when I run this I get the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/delphi_cdc_vaccines/__main__.py", line 12, in <module>
    run_module(read_params())  # pragma: no cover
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/delphi_cdc_vaccines/run.py", line 43, in run_module
    all_data = pull_cdcvacc_data(base_url, logger)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/delphi_cdc_vaccines/pull.py", line 85, in pull_cdcvacc_data
    df.columns = ["fips",
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/generic.py", line 5500, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/generic.py", line 766, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/internals/base.py", line 57, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 11 elements, new values have 10 elements

cdc_vaccines/delphi_cdc_vaccines/run.py Outdated Show resolved Hide resolved
cdc_vaccines/delphi_cdc_vaccines/run.py Outdated Show resolved Hide resolved
cdc_vaccines/delphi_cdc_vaccines/run.py Outdated Show resolved Hide resolved
@Ananya-Joshi
Copy link
Contributor Author

Ananya-Joshi commented Oct 12, 2021

Nits suggested; also when I run this I get the following exception:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/delphi_cdc_vaccines/__main__.py", line 12, in <module>
    run_module(read_params())  # pragma: no cover
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/delphi_cdc_vaccines/run.py", line 43, in run_module
    all_data = pull_cdcvacc_data(base_url, logger)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/delphi_cdc_vaccines/pull.py", line 85, in pull_cdcvacc_data
    df.columns = ["fips",
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/generic.py", line 5500, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 70, in pandas._libs.properties.AxisProperty.__set__
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/generic.py", line 766, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 216, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "/home/krivard/projects/covid/dev/covidcast-indicators/cdc_vaccines/env/lib/python3.8/site-packages/pandas/core/internals/base.py", line 57, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 11 elements, new values have 10 elements

Working on this now! Seems like the CDC Changed their base file recently.

@Ananya-Joshi Ananya-Joshi marked this pull request as draft December 21, 2021 02:40
@Ananya-Joshi Ananya-Joshi marked this pull request as ready for review December 22, 2021 02:39
@Ananya-Joshi Ananya-Joshi marked this pull request as draft December 22, 2021 02:43
@Ananya-Joshi
Copy link
Contributor Author

As a note, we will probably need to update the source if we want to include data on Booster Shots: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-Jurisdi/unsk-b7fc
vs. what we are currently using: https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh which does not give information on boosters. Let me know what you think we should do @krivard @dshemetov

@Ananya-Joshi Ananya-Joshi marked this pull request as ready for review December 22, 2021 16:26
@dshemetov
Copy link
Contributor

Sorry for taking so long to get to this review @Ananya-Joshi. I am getting a test error when I pull and test locally, see below

test_run.py F...                                                                                                         [100%]

=========================================================== FAILURES ===========================================================
_______________________________________________ TestRun.test_output_files_exist ________________________________________________

self = <test_run.TestRun object at 0x7f2b733ee0a0>

    def test_output_files_exist(self):
        """Test that the expected output files exist."""
        run_module(self.PARAMS)
    
        csv_files = [f for f in listdir("receiving") if f.endswith(".csv")]
    
        dates = [
            "20210810",
            "20210811",
            "20210812",
            "20210813",
            "20210814",
            "20210815",
            "20210816",
            "20210817",
        ]
        geos = ["state", "hrr", "hhs", "nation", "msa"]
    
        expected_files = []
        for metric in ["cumulative_counts_tot_vaccine",
                                    "incidence_counts_tot_vaccine",
                                    "cumulative_counts_tot_vaccine_12P",
                                    "incidence_counts_tot_vaccine_12P",
                                    "cumulative_counts_tot_vaccine_18P",
                                    "incidence_counts_tot_vaccine_18P",
                                    "cumulative_counts_tot_vaccine_65P",
                                    "incidence_counts_tot_vaccine_65P",
                                    "cumulative_counts_part_vaccine",
                                    "incidence_counts_part_vaccine",
                                    "cumulative_counts_part_vaccine_12P",
                                    "incidence_counts_part_vaccine_12P",
                                    "cumulative_counts_part_vaccine_18P",
                                    "incidence_counts_part_vaccine_18P",
                                    "cumulative_counts_part_vaccine_65P",
                                    "incidence_counts_part_vaccine_65P"]:
            for date in dates:
                for geo in geos:
                    expected_files += [date + "_" + geo + "_" + metric + ".csv"]
                    if not("cumulative" in metric) and not (date in dates[:6]):
                        expected_files += [date + "_" + geo + "_" + metric + "_7dav.csv"]
    
        print(set(csv_files)-set(expected_files))
>       assert set(csv_files) == set(expected_files)
E       AssertionError: assert {'20210810_hh...12P.csv', ...} == {'20210810_hh...12P.csv', ...}
E         Extra items in the left set:
E         '20210818_state_incidence_counts_tot_vaccine_18P.csv'
E         '20210819_state_incidence_counts_tot_vaccine_65P_7dav.csv'
E         '20210819_nation_cumulative_counts_tot_vaccine_18P.csv'
E         '20210819_hhs_incidence_counts_tot_vaccine_18P.csv'
E         '20210818_hhs_incidence_counts_part_vaccine_12P.csv'
E         '20210819_hhs_cumulative_counts_part_vaccine.csv'...
E         
E         ...Full output truncated (236 lines hidden), use '-vv' to show

test_run.py:71: AssertionError

@dshemetov
Copy link
Contributor

Can you also please merge main into this when you can, so the diffs are a bit easier to read?

Copy link
Contributor

@dshemetov dshemetov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to block this indicator any further. It seems fine to me, I ran through the pipeline pretty closely. Just a failing test, a merge, and a small suggestion, and we're good!

cdc_vaccines/delphi_cdc_vaccines/pull.py Outdated Show resolved Hide resolved
@dshemetov
Copy link
Contributor

dshemetov commented Jan 14, 2022

Pushing a small commit to make the diff code suggestion pass test_pull.py. All tests pass now 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants