Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding genotype information #117

Open
charlesfoster opened this issue Jul 27, 2021 · 7 comments
Open

Adding genotype information #117

charlesfoster opened this issue Jul 27, 2021 · 7 comments

Comments

@charlesfoster
Copy link

Hi,
I'd like to use lofreq for my pipeline, but I require genotype information for downstream commands. I tried using lofreq2_add_sample.py like so:

./lofreq2_add_sample.py -i in.vcf.gz -o out.vcf.gz -b $BAM

However, I get the following error:

Traceback (most recent call last):
  File "./lofreq2_add_sample.py", line 312, in <module>
    main()
  File "./lofreq2_add_sample.py", line 307, in main
    add_plp_to_vcf(args.vcf_in, args.vcf_out, args.bams)
  File "./lofreq2_add_sample.py", line 229, in add_plp_to_vcf
    for row in vcf_reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

If I change the read mode for files in the add_plp_to_vcf function from 'rb' to 'r' to try and get around this error, I get:

Traceback (most recent call last):
  File "./lofreq2_add_sample.py", line 312, in <module>
    main()
  File "./lofreq2_add_sample.py", line 307, in main
    add_plp_to_vcf(args.vcf_in, args.vcf_out, args.bams)
  File "./lofreq2_add_sample.py", line 245, in add_plp_to_vcf
    vcf_writer.writerow(row)
  File "/usr/lib/python3.8/gzip.py", line 276, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Do you have a workaround? My python version is 3.8.6. Thanks.

@charlesfoster
Copy link
Author

I ended up writing a simple bash script to add 'fake' genotype information to a VCF file, e.g. specifying that my sample is from a virus --> GT=1. This is good enough for my purposes. I can provide the script if it will be of use for anyone else, in the absence of a more robust alternative.

@arunvv90
Copy link

arunvv90 commented Aug 2, 2022

Hi,
I am little late for the party. I am having the same problem. My sample is also virus. Can you please share the script if you still have it. It can save my day. Another rquestion is, did you merge the multiple samples?

@charlesfoster
Copy link
Author

Hi @arunvv90,
Here's the script (with a .txt suffix so I can attach it):

add_artificial_genotype.txt

I run lofreq on individual samples, then run the attached script before potentially merging multiple samples.

@arunvv90
Copy link

arunvv90 commented Aug 3, 2022

Thanks a lot man. I was keep on trying different things. Really appreciate your quick response. Does this script add the sample name field in the header, which is required for merging the multiple samples ?

@charlesfoster
Copy link
Author

Yep, the sample name is added to the header. Previously the sample name was only guessed from the infile name, but I just added another flag to allow you to explicitly specify the name. New script attached.
add_artificial_genotype.txt

The raw vcf:

image

Modified vcf after running script:

image

@arunvv90
Copy link

arunvv90 commented Aug 3, 2022

Wow! Lightening speed!! I just tested the script and it work like charm! I was about to use bcftools reheader to change the file name. Let me test the new script for custom sample name

@arunvv90
Copy link

arunvv90 commented Aug 3, 2022

I just tested the sample name feature also. It worked perfectly. Simple & easy solution!!! Thank you very much
av724@bioram /s/a/v/s/s/l/test> bash add_artificial_genotype.sh -i BCAHV_vibi_indelq_alnq_call.vcf.gz -g 1/1 -n test_samplename -o out3.vcf.gz
VCF with artificial genotype written to out3.vcf.gz
av724@bioram /s/a/v/s/s/l/test> ls (npsm)
add_artificial_genotype.sh* BCAHV_vibi_indelq_alnq_call.vcf.gz.tbi out3.vcf.gz.tbi
BCAHV_vibi_indelq_alnq_call.vcf.gz out3.vcf.gz
av724@bioram /s/a/v/s/s/l/test> bcftools query -l out3.vcf.gz (npsm)
test_samplename

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants