Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

individual specific kmer could be find in other individual #53

Open
leon945945 opened this issue Nov 26, 2024 · 2 comments
Open

individual specific kmer could be find in other individual #53

leon945945 opened this issue Nov 26, 2024 · 2 comments

Comments

@leon945945
Copy link

Hi, the purpose of my analysis is to get the paternal specific kmer, and to increase the specificity of kmer I chosed 62mer to perform analysis, because 72mer would lead to an error of meryl.

At first, I generated the 62mer of progeny, maternal and paternal with HiFi data for progeny and maternal, Illumina data for paternal.
meryl count k=62 progeny.fq output progeny.meryl
meryl count k=62 maternal.fq output maternal.meryl
meryl count k=62 paternal.fq output paternal.meryl

Then, the maternal/paternal count of common kmer between progeny and maternal/paternal were outputed.
meryl intersect maternal.meryl/ progeny.meryl/ output mat-progeny_commonk62.meryl
meryl intersect paternal.meryl/ progeny.meryl/ output pat-progeny_commonk62.meryl
and printed the kmer count:
meryl print mat-progeny_commonk62.meryl > mat-progeny.kmerCount
meryl print pat-progeny_commonk62.meryl > pat-progeny.kmerCount

Then, I merged the mat-progeny.kmerCount and pat-progeny.kmerCount, and I thought the maternal kmers whose count equals to 0 were paternal specific kmer. I extract the sequences of paternal specifc kmers and aligned them to the maternal genome (assembled by the same HiFi file with kmer count) with bowtie2, however, a large proportion of them could be aligned to maternal genome without mismatch or indel (CIGAR=62M), which means the paternal specific kmers could be find in maternal.

Here is my confusion, the paternal specific kmer should not be existed in maternal, why they could be aligned in maternal genome.
Hope for your reply, thanks very much.

@arangrhie
Copy link
Contributor

Hello @leon945945 ,

meryl intersect takes the count of the first meryl db.

I think you want the meryl difference instead. It takes the kmer count of the first input db, which does not exist in the second db.

Merqury/trio/hapmers.sh does exactly what you are trying to do - and generates a histogram to visually confirm the hapmer spectrum. I call the parental specific kmers inherited to the child as hapmers.

best,
Arang

@leon945945
Copy link
Author

Thanks for your reply, I will try to take use of merqury.
And where is the wrong place of my analysis, firstly get the shared kmers between progeny and maternal, progeny and paternal, then merge the two kmer database.Thus, maternal kmers whose count equals to 0 is paternal specific kmers, paternal kmers whose count equals to 0 is maternal specific kmers.
It's my pleasure if you can give me some advances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants