Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError #1

Open
steinbrl opened this issue Oct 14, 2021 · 8 comments
Open

KeyError #1

steinbrl opened this issue Oct 14, 2021 · 8 comments

Comments

@steinbrl
Copy link

Hi,

I tried to pull an consensus from an Minimap2 Alignment. sam2consensus stops with the following error message:
lars@helmut:~/Desktop/VZV/Carina_09_2021/VZV_ORF14_cmv/assembly-test/Scaffold$ python /home/lars/Desktop/NGS/sam2consensus-master/sam2consensus.py -f N -p final -i scaffold_mapping.sam

Processing file scaffold_mapping.sam:

SAM header processed, 1 references found.

0 reads processed.
Traceback (most recent call last):
File "/home/lars/Desktop/NGS/sam2consensus-master/sam2consensus.py", line 430, in
main()
File "/home/lars/Desktop/NGS/sam2consensus-master/sam2consensus.py", line 212, in main
sequences[refname][pos_ref][nuc] += 1
KeyError: '*'

The .sam file seemed to be correct. There are no issues while importing and visualize it in CLC or Geneious.

Best,

Lars

@edgardomortiz
Copy link
Owner

Hi Lars,

Would you mind sharing a few thousand lines of your SAM file?, I haven't tried SAMs from minimap yet. It might be something related to that. You can email me directly if you don't want to share your file here in GitHub.

Edgardo

@steinbrl
Copy link
Author

Wow, very fast reply :D I can share the whole file, it is not that big. It is the product of an reference based scaffolding step. So, it contains orientated contigs.
scaffold_mapping.sam.zip

@edgardomortiz
Copy link
Owner

Thanks, I will take a look and get back in the new few days (pretty busy with work too)

@DenisTEMPE
Copy link

Hi Edgardo, hi Lars,
I'm having the same kind of error while processing a locus in chr7 of sam files from either tophat2 or STAR.
Thank you in advance,
Denis

Processing file locus_accepted_hits.sam:

SAM header processed, 0 references found.

Traceback (most recent call last):
File "sam2consensus.py", line 436, in
main()
File "sam2consensus.py", line 223, in main
sequences[refname][pos_ref][nuc] += 1
KeyError: 'chr7'

locus_accepted_hits.zip

@edgardomortiz
Copy link
Owner

Sorry , it took me longer than I expected to check on your files.
@steinbrl sam2consensus needs a SAM with full header information, also the SAM you sent doesn't have any CIGAR strings, perhaps you could provide the command you used to create this SAM?
@DenisTEMPE your case is easier to solve, assuming these are the first lines of the SAM you got, perhaps you are obtaining the file through samtools view which without additional options skips the header, just add -h to your samtools view command to include the header in the output.

Edgardo

@DenisTEMPE
Copy link

DenisTEMPE commented Nov 10, 2021 via email

@steinbrl
Copy link
Author

steinbrl commented Nov 10, 2021 via email

@edgardomortiz
Copy link
Owner

edgardomortiz commented Nov 10, 2021

Hi @DenisTEMPE ,

After a few tests I found the reason for the failure. My script creates empty containers for the reference sequences which get filled with nucleotides sums as the SAM is processed, these take around 10 bytes per nucleotide in the reference. So, given a reference as long as the one you have the script quickly fills up the RAM and get the process killed. My code is impractical for large references, it can only work well with references of at most 10Mbp, when I wrote the program I was working with small organellar genomes.

Perhaps I will attempt to rewrite the code for large references, but I can't promise when. Sorry, you will have to look for another solution.

@steinbrl thank for the info, that makes perfect sense for your case. If the consensus calculated by bcftools is reference-agnostic it could help @DenisTEMPE too.

Edgardo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants