format of sequence column in busco_full_table.tsv #8

XuanZhang-Black · 2024-03-06T12:42:46Z

Hi Alex,

I used busco to evaluate the genome_assembly.fa file and the resulting file for syngraph, I extracted the *full_table.tsv file as shown below,:
0at7088 25 6419737 6528949
1at7088 28 9918437 9798722
2at7088 5 9853750 9950607
3at7088 8 16558545 16762952

when I ran it I encountered an error:
Traceback (most recent call last):
File "/home/data/t240413/software/syngraph-master/syngraph", line 7, in < module>
main()
File "/home/data/t240413/software/syngraph-master/cli/interface.py", line 39, in main
build.main(run_params)
File "/home/data/t240413/software/syngraph-master/cli/build.py", line 56, in main
markerObjs = sg.load_markerObjs(parameterObj)
File "/home/data/t240413/software/syngraph-master/source/syngraph.py", line 35, in load_markerObjs
df = pd.read_csv(infile,
File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 912, in read_csv
return _read(filepath_or_buffer, kwds)
File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 583, in _read
return parser.read(nrows)
File "/ home/data/t240413 / miniconda3 / envs myenv/lib/python3.8 / site - packages/pandas/IO/parsers/readers. Py", line 1704, in read
) = self._engine.read( # type: ignore[attr-defined]
File "/ home/data/t240413 miniconda3 envs/myenv/lib/python3.8 / site - packages/pandas/IO/parsers/c_parser_wrapper py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "pandas/_libs/parsers.pyx", line 812, in pandas._libs.parsers.TextReader.read_low_memory
File "pandas/_libs/parsers.pyx", line 889, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1034, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1192, in pandas._libs.parsers.TextReader._convert_with_dtype
ValueError: Integer column has NA values in column 2

I have checked my file, the second column is chromosomal serial number, I want to know the sequence column has any special requirements?
Looking for your reply.

Best,

Xuan Zhang

A-J-F-Mackintosh · 2024-03-06T13:30:07Z

Hi Xuan Zhang,

I suspect that you have included lines in the BUSCO tsv for genes that are missing in your assembly - these lines will have empty coordinates and cause an error. You should grep the file to only include Complete BUSCOs (you can include Fragmented ones too if you like).

Also make sure that the file in tab delimited.

Cheers,

Alex

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

format of sequence column in busco_full_table.tsv #8

format of sequence column in busco_full_table.tsv #8

XuanZhang-Black commented Mar 6, 2024

A-J-F-Mackintosh commented Mar 6, 2024

format of sequence column in busco_full_table.tsv #8

format of sequence column in busco_full_table.tsv #8

Comments

XuanZhang-Black commented Mar 6, 2024

A-J-F-Mackintosh commented Mar 6, 2024