Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format of sequence column in busco_full_table.tsv #8

Open
XuanZhang-Black opened this issue Mar 6, 2024 · 1 comment
Open

format of sequence column in busco_full_table.tsv #8

XuanZhang-Black opened this issue Mar 6, 2024 · 1 comment

Comments

@XuanZhang-Black
Copy link

Hi Alex,

I used busco to evaluate the genome_assembly.fa file and the resulting file for syngraph, I extracted the *full_table.tsv file as shown below,:
0at7088 25 6419737 6528949
1at7088 28 9918437 9798722
2at7088 5 9853750 9950607
3at7088 8 16558545 16762952

when I ran it I encountered an error:
Traceback (most recent call last):
File "/home/data/t240413/software/syngraph-master/syngraph", line 7, in < module>
main()
File "/home/data/t240413/software/syngraph-master/cli/interface.py", line 39, in main
build.main(run_params)
File "/home/data/t240413/software/syngraph-master/cli/build.py", line 56, in main
markerObjs = sg.load_markerObjs(parameterObj)
File "/home/data/t240413/software/syngraph-master/source/syngraph.py", line 35, in load_markerObjs
df = pd.read_csv(infile,
File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 583, in _read
return parser.read(nrows)
File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/ home/data/t240413 miniconda3 envs/myenv/lib/python3.8 / site - packages/pandas/IO/parsers/c_parser_wrapper py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 812, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 889, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1034, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1192, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Integer column has NA values in column 2

I have checked my file, the second column is chromosomal serial number, I want to know the sequence column has any special requirements?
Looking for your reply.

Best,

Xuan Zhang

@A-J-F-Mackintosh
Copy link
Owner

Hi Xuan Zhang,

I suspect that you have included lines in the BUSCO tsv for genes that are missing in your assembly - these lines will have empty coordinates and cause an error. You should grep the file to only include Complete BUSCOs (you can include Fragmented ones too if you like).

Also make sure that the file in tab delimited.

Cheers,

Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants