Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues convert vcf file to phylyp #50

Open
lophostoma opened this issue Jun 5, 2024 · 9 comments
Open

issues convert vcf file to phylyp #50

lophostoma opened this issue Jun 5, 2024 · 9 comments

Comments

@lophostoma
Copy link

Hi,
I have the following issue. the vcf2phylip tool did not process the VCF file provided as expected. The output format 58 0 indicates that it detected 58 samples but 0 sites, which is not typical for a valid VCF file containing genotype information.
I used the following code to generate the vcf file prior to use vcf2phylyp

enroot start --mount $HOME --root --rw staphb+bcftools sh -c "
bcftools view -h /home/carlos.carrion/output_filtered.vcf > /home/carlos.carrion/output_reformat.vcf &&
bcftools query -f "%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO[\t%SAMPLE=%GP]\n" /home/carlos.carrion/output_filtered.vcf >> /home/carlos.carrion/output_reformat.vcf"

Thanks

@edgardomortiz
Copy link
Owner

Hi @lophostoma

To diagnose the problem I need a few thousand lines from your VCF and probably the exact error message from vcf2phylip since I rarely use bcftools and could not predict what kind of output your command will make. Maybe the genotypes were not biallelic?

Edgardo

@lophostoma
Copy link
Author

Ok Thanks for your reply. Attached you will find vcf file and the output file from vcf2phylip.
I did not obtain error message, but only an empty output file. the genotyped were generated in ANGSD and are biallelic.
subsample.vcf.zip
tmp.min4.phy.zip

Thanks for your time
Carlos

@edgardomortiz
Copy link
Owner

Hi again Carlos,

Your vcf zip file seems to be corrupted, I re-downloaded a couple of times and can't be decompressed...

@lophostoma
Copy link
Author

I am having issues with the size of the file that I can send you trough GitHub. if possible can I send to a email account??
Hope this goes ok
tmp_carlos.vcf.gz

@edgardomortiz
Copy link
Owner

The new file is corrupted as well. My email has size limitations too. I just need at most 1000 lines, you can run this on your VCF:

head -1000 my_vcf.vcf > 1000lines.vcf

Then compress the result and upload, it shouldn't be too big.

Edgardo

@lophostoma
Copy link
Author

Atached the file:
1000lines.vcf.zip

@edgardomortiz
Copy link
Owner

edgardomortiz commented Jun 6, 2024

Hi, sorry I was assuming that the 1000 lines would contain some genotypes, I only got the headers of your reference contigs. Please add 1000 to the number of contigs in your reference (i.e. if your reference has 6500 contigs, repeat the head command with head -7500). Also I saw the phylip you sent, are you sure your VCF contains valid genotypes? how many should there be?

Edgardo

@edgardomortiz
Copy link
Owner

edgardomortiz commented Jun 6, 2024

Also, I checked the bcftools manual and I think you are creating a non-standard VCF format by using this command:

bcftools query -f "%CHROM\t%POS\t%ID\t%REF\t%ALT\t%QUAL\t%FILTER\t%INFO[\t%SAMPLE=%GP]\n"

vcf2phylip can only handle the standard VCF format: https://samtools.github.io/hts-specs/VCFv4.2.pdf so I would recommend to leave the VCF format as default (why do you need that specific format?)

Edgardo

@lophostoma
Copy link
Author

Hi,
Thanks for your reply. I will generate the file again in standard VCF format and try vcf2phylip.
I will let you know the result of this. Thanks for helping me notice that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants