Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas.errors.UndefinedVariableError: name 'nan' is not defined #13

Open
allaigle opened this issue Oct 19, 2024 · 5 comments
Open

pandas.errors.UndefinedVariableError: name 'nan' is not defined #13

allaigle opened this issue Oct 19, 2024 · 5 comments

Comments

@allaigle
Copy link

allaigle commented Oct 19, 2024

Hi,

Thank you so much for having created such a nice workflow!!

Depending on the species I am running it, I get this error, sometimes for every resolution, sometimes only for one:

Job 49: Fix inverted contigs (if needed) in the 3D structure at resolution 10000
Reason: Missing output files: structure/10000/structure_verified_contigs.pdb; Input files updated by another job: structure/10000/structure_with_chr.pdb

python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1 
Activating conda environment: .snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_
[Sat Oct 19 06:10:13 2024]
Error in rule verify_inverted_contigs:
    jobid: 49
    input: structure/10000/structure_with_chr.pdb, genome.fasta
    output: structure/10000/structure_verified_contigs.pdb, sequence/10000/genome_verified_contigs.fasta
    log: logs/verify_inverted_contigs_10000.log (check log file(s) for error details)
    conda-env: /PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_
    shell:
        python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1 
[...]
Looking for inverted contigs into chromosome nan
Traceback (most recent call last):
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/scope.py", line 231, in resolve
    return self.resolvers[key]
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/collections/__init__.py", line 941, in __getitem__
    return self.__missing__(key)            # support subclasses that define __missing__
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/collections/__init__.py", line 933, in __missing__
    raise KeyError(key)
KeyError: 'nan'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/scope.py", line 242, in resolve
    return self.temps[key]
KeyError: 'nan'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/PATH/WORKDIR/../scripts/verify_inverted_contigs.py", line 377, in <module>
    INVERTED_CONTIGS = find_inverted_contigs(
  File "/PATH/WORKDIR/../scripts/verify_inverted_contigs.py", line 186, in find_inverted_contigs
    chromosome_df = structure_df.query(
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/frame.py", line 4823, in query
    res = self.eval(expr, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/frame.py", line 4949, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/eval.py", line 336, in eval
    parsed_expr = Expr(expr, engine=engine, parser=parser, env=env)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 809, in __init__
    self.terms = self.parse()
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 828, in parse
    return self._visitor.visit(self.expr)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 418, in visit_Module
    return self.visit(expr, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 421, in visit_Expr
    return self.visit(node.value, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 719, in visit_Compare
    return self.visit(binop)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 532, in visit_BinOp
    op, op_class, left, right = self._maybe_transform_eq_ne(node)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 454, in _maybe_transform_eq_ne
    right = self.visit(node.right, side="right")
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 412, in visit
    return visitor(node, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/expr.py", line 545, in visit_Name
    return self.term_type(node.id, self.env, **kwargs)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/ops.py", line 91, in __init__
    self._value = self._resolve_name()
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/ops.py", line 115, in _resolve_name
    res = self.env.resolve(local_name, is_local=is_local)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/computation/scope.py", line 244, in resolve
    raise UndefinedVariableError(key, is_local) from err
pandas.errors.UndefinedVariableError: name 'nan' is not defined
================================================================================

Exiting because a job execution failed. Look above for error message
Complete log: ../.snakemake/log/2024-10-18T101701.614686.snakemake.log

Note that when I tested python ../scripts/verify_inverted_contigs.py --run True --pdb structure/10000/structure_with_chr.pdb --fasta genome.fasta --resolution 10000 --output-pdb structure/10000/structure_verified_contigs.pdb --output-fasta sequence/10000/genome_verified_contigs.fasta >logs/verify_inverted_contigs_10000.log 2>&1 , I got no such error.

I also tried

cd $3DGB
vim ./scripts/verify_inverted_contigs.py 
from numpy import nan # right after "import numpy as np"

but when rerunning the pipeline, I got the same issue.

Is there any "easy" solution to fix it like adding an import command?

Thanks!

@allaigle
Copy link
Author

allaigle commented Oct 19, 2024

Also, when not having an issue with pandas.errors.UndefinedVariableError: name 'nan' is not defined, I sometimes get the following one:

Number of beads read from structure: 1847
Number of beads deduced from sequence and HiC resolution: 1847
Traceback (most recent call last):
  File "/PATH/WORKDIR/../scripts/assign_chromosomes.py", line 196, in <module>
    assign_chromosome_number(ARGS.pdb, CHROMOSOME_LENGTH, ARGS.resolution, ARGS.output)
  File "/PATH/WORKDIR/../scripts/assign_chromosomes.py", line 185, in assign_chromosome_number
    coordinates.to_pdb(path=pdb_name_out, records=None, gz=False, append_newline=True)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/biopandas/pdb/pandas_pdb.py", line 719, in to_pdb
    dfs[r][col["id"]] = dfs[r][col["id"]].apply(col["strf"])
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/series.py", line 4917, in apply
    return SeriesApply(
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/apply.py", line 1427, in apply
    return self.apply_standard()
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/apply.py", line 1507, in apply_standard
    mapped = obj._map_values(
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/base.py", line 921, in _map_values
    return algorithms.map_array(arr, mapper, na_action=na_action, convert=convert)
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/pandas/core/algorithms.py", line 1743, in map_array
    return lib.map_infer(values, mapper, convert=convert)
  File "lib.pyx", line 2972, in pandas._libs.lib.map_infer
  File "/PATH/WORKDIR/.snakemake/conda/faf49a986fcfda1ef548d9ee40bd54ba_/lib/python3.9/site-packages/biopandas/pdb/engines.py", line 161, in <lambda>
    if len(str(int(x))) < 3
ValueError: cannot convert float NaN to integer

Is there no way to add something like .dropna() somewhere in scripts/assign_chromosomes.py ?

Thank you for your help !

@allaigle
Copy link
Author

Might be useful for someone: issues are arising only when species have scaffolds in their genome.fasta. Otherwise, it works fine.

@gaellelelandais
Copy link
Contributor

Thank you for your feedback and comments @allaigle. @pierrepo will take a look at your message as soon as possible. Thank you for your understanding.

@pierrepo
Copy link
Contributor

Hey @allaigle

These errors could be related to your genome definition (genome.fasta). If it's an already published genome would you mind sharing a link we could have a look at?
If it is not yet published, would you mind characterizing it a little bit: number of chromosomes/contigs, number of bases per chromosome/contig... ?

Also, could you try running the workflow with the verify_contigs option as False in your config file (although I guess you might be very interested with this option being activated).

@allaigle
Copy link
Author

allaigle commented Nov 9, 2024

Hi @pierrepo,

Thank you for your answer ! After this issue, I checked the genome quality using QUAST and decided to remove all contigs/scaffolds shorter than 500kb, which did not changed much the NC50, so I ran everything with this setting, and almost all my species worked fine, except one (for now), which is the genome GCA_006506795.2 (Hericium erinaceus).

Indeed, I ran all of them with verify_contigs option as True but it did not show any issue after trimming.

Thank you for your help, much appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants