Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[External] [Bug] Pandas 'host_age' KeyError in Test Profile #225

Closed
1 of 3 tasks
nrminor opened this issue Nov 29, 2024 · 1 comment
Closed
1 of 3 tasks

[External] [Bug] Pandas 'host_age' KeyError in Test Profile #225

nrminor opened this issue Nov 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@nrminor
Copy link

nrminor commented Nov 29, 2024

Hi all,

The lab I work for is looking to set up tostadas for our SARS-CoV-2 uploads. Set up on one of our Ubuntu towers went as expected with both the staphb Docker container and the provided Conda environment spec. However, when we ran the test profile with nextflow run main.nf -profile test,docker --virus, we ran into the following Python error (I'll include it with the full nextflow output):

Nextflow 24.10.2 is available - Please consider updating your version to it

 N E X T F L O W   ~  version 24.10.0

Launching `main.nf` [berserk_bernard] DSL2 - revision: 92b49e04b8

executor >  local (2)
[7c/06fc9f] TOSTADAS_WORKFLOW:TOSTADAS:VALIDATE_PARAMS                      [100%] 1 of 1 ✔
[1f/947a4e] TOSTADAS_WORKFLOW:TOSTADAS:METADATA_VALIDATION                  [  0%] 0 of 1
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:GET_WAIT_TIME                        -
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:INITIAL_SUBMISSION:SUBMISSION        -
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:INITIAL_SUBMISSION:WAIT              -
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:INITIAL_SUBMISSION:UPDATE_SUBMISSION -
WARN: Undocumented setting `docker.userEmulation` is not supported any more - please remove it from your config
ERROR ~ Error executing process > 'TOSTADAS_WORKFLOW:TOSTADAS:METADATA_VALIDATION'

Caused by:
executor >  local (2)
[7c/06fc9f] TOSTADAS_WORKFLOW:TOSTADAS:VALIDATE_PARAMS                      [100%] 1 of 1 ✔
[1f/947a4e] TOSTADAS_WORKFLOW:TOSTADAS:METADATA_VALIDATION                  [100%] 1 of 1, failed: 1 ✘
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:GET_WAIT_TIME                        -
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:INITIAL_SUBMISSION:SUBMISSION        -
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:INITIAL_SUBMISSION:WAIT              -
[-        ] TOSTADAS_WORKFLOW:TOSTADAS:INITIAL_SUBMISSION:UPDATE_SUBMISSION -
WARN: Undocumented setting `docker.userEmulation` is not supported any more - please remove it from your config
ERROR ~ Error executing process > 'TOSTADAS_WORKFLOW:TOSTADAS:METADATA_VALIDATION'

Caused by:
  Process `TOSTADAS_WORKFLOW:TOSTADAS:METADATA_VALIDATION` terminated with an error exit status (1)


Command executed:

  validate_metadata.py         --meta_path metadata_template.xlsx         --output_dir .         --custom_fields_file /absolute/path/to/tostadas/assets/custom_meta_fields/example_custom_fields.json         --validate_custom_fields false

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/opt/conda/envs/tostadas/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
      return self._engine.get_loc(casted_key)
    File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'host_age'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File "/absolute/path/to/tostadas/bin/validate_metadata.py", line 1261, in <module>
      metadata_validation_main()
    File "/absolute/path/to/tostadas/bin/validate_metadata.py", line 52, in metadata_validation_main
      meta_to_df.run_get_meta_df()
    File "/absolute/path/to/tostadas/bin/validate_metadata.py", line 178, in run_get_meta_df
      self.final_df = self.populate_fields()
    File "/absolute/path/to/tostadas/bin/validate_metadata.py", line 197, in populate_fields
      for x in range(len(final_df[col].tolist())):
    File "/opt/conda/envs/tostadas/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__
      indexer = self.columns.get_loc(key)
    File "/opt/conda/envs/tostadas/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
      raise KeyError(key) from err
  KeyError: 'host_age'

Work dir:
  /absolute/path/to/tostadas/work/1f/947a4e52009dc4a7158f9a8ae804cf

Container:
  staphb/tostadas:latest

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

I'd guess that this stems from the fact that the default metadata template, assets/metadata_template.xlsx, no longer contains the "host_age" column.

Thanks in advance for your help!

Severity

  • 1 - Most severe (a full-break in core function)
  • 2-4 - Moderate (break for a particular aspect/feature) (how integral is the broken feature?)
  • 5 - Least severe (non-functional issue, such as inconsistency / error in documentation or administrative in nature)

System details

  • OS: [e.g. iOS]: Ubuntu focal 20.04 x86_64
  • Browser [e.g. chrome, safari]: Arc Browser
  • Version [e.g. 22]: 4.0.0
  • Run environment (container, cloud service, HPC, platform, etc.): Docker containers and conda environment

Steps to Replicate

  1. Clone the repo's default branch (dev) with git clone https://github.com/CDCgov/tostadas.git && cd tostadas
  2. Run the docker and test profiles with nextflow run main.nf -profile test,docker --virus. The same error will pop up with nextflow run main.nf -profile test,conda --virus
@nrminor nrminor added the bug Something isn't working label Nov 29, 2024
@jessicarowell
Copy link
Collaborator

Thanks, yes I forgot to update the template (59f146b). I also added in some code to skip columns that aren't in the template when adding "Not Provided" to fields in mandatory columns that are blank (e2f03f4).

Apologies for the delay in getting to this ticket!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants