Adding genotype information #117

charlesfoster · 2021-07-27T01:53:15Z

Hi,
I'd like to use lofreq for my pipeline, but I require genotype information for downstream commands. I tried using lofreq2_add_sample.py like so:

./lofreq2_add_sample.py -i in.vcf.gz -o out.vcf.gz -b $BAM

However, I get the following error:

Traceback (most recent call last):
  File "./lofreq2_add_sample.py", line 312, in <module>
    main()
  File "./lofreq2_add_sample.py", line 307, in main
    add_plp_to_vcf(args.vcf_in, args.vcf_out, args.bams)
  File "./lofreq2_add_sample.py", line 229, in add_plp_to_vcf
    for row in vcf_reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

If I change the read mode for files in the add_plp_to_vcf function from 'rb' to 'r' to try and get around this error, I get:

Traceback (most recent call last):
  File "./lofreq2_add_sample.py", line 312, in <module>
    main()
  File "./lofreq2_add_sample.py", line 307, in main
    add_plp_to_vcf(args.vcf_in, args.vcf_out, args.bams)
  File "./lofreq2_add_sample.py", line 245, in add_plp_to_vcf
    vcf_writer.writerow(row)
  File "/usr/lib/python3.8/gzip.py", line 276, in write
    data = memoryview(data)
TypeError: memoryview: a bytes-like object is required, not 'str'

Do you have a workaround? My python version is 3.8.6. Thanks.

The text was updated successfully, but these errors were encountered:

charlesfoster · 2021-07-27T03:29:09Z

I ended up writing a simple bash script to add 'fake' genotype information to a VCF file, e.g. specifying that my sample is from a virus --> GT=1. This is good enough for my purposes. I can provide the script if it will be of use for anyone else, in the absence of a more robust alternative.

arunvv90 · 2022-08-02T16:55:44Z

Hi,
I am little late for the party. I am having the same problem. My sample is also virus. Can you please share the script if you still have it. It can save my day. Another rquestion is, did you merge the multiple samples?

charlesfoster · 2022-08-02T23:55:21Z

Hi @arunvv90,
Here's the script (with a .txt suffix so I can attach it):

add_artificial_genotype.txt

I run lofreq on individual samples, then run the attached script before potentially merging multiple samples.

arunvv90 · 2022-08-03T00:01:55Z

Thanks a lot man. I was keep on trying different things. Really appreciate your quick response. Does this script add the sample name field in the header, which is required for merging the multiple samples ?

charlesfoster · 2022-08-03T00:16:03Z

Yep, the sample name is added to the header. Previously the sample name was only guessed from the infile name, but I just added another flag to allow you to explicitly specify the name. New script attached.
add_artificial_genotype.txt

The raw vcf:

Modified vcf after running script:

arunvv90 · 2022-08-03T00:22:45Z

Wow! Lightening speed!! I just tested the script and it work like charm! I was about to use bcftools reheader to change the file name. Let me test the new script for custom sample name

arunvv90 · 2022-08-03T00:30:40Z

I just tested the sample name feature also. It worked perfectly. Simple & easy solution!!! Thank you very much
av724@bioram /s/a/v/s/s/l/test> bash add_artificial_genotype.sh -i BCAHV_vibi_indelq_alnq_call.vcf.gz -g 1/1 -n test_samplename -o out3.vcf.gz
VCF with artificial genotype written to out3.vcf.gz
av724@bioram /s/a/v/s/s/l/test> ls (npsm)
add_artificial_genotype.sh* BCAHV_vibi_indelq_alnq_call.vcf.gz.tbi out3.vcf.gz.tbi
BCAHV_vibi_indelq_alnq_call.vcf.gz out3.vcf.gz
av724@bioram /s/a/v/s/s/l/test> bcftools query -l out3.vcf.gz (npsm)
test_samplename

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding genotype information #117

Adding genotype information #117

charlesfoster commented Jul 27, 2021

charlesfoster commented Jul 27, 2021

arunvv90 commented Aug 2, 2022

charlesfoster commented Aug 2, 2022

arunvv90 commented Aug 3, 2022

charlesfoster commented Aug 3, 2022

arunvv90 commented Aug 3, 2022

arunvv90 commented Aug 3, 2022

Adding genotype information #117

Adding genotype information #117

Comments

charlesfoster commented Jul 27, 2021

charlesfoster commented Jul 27, 2021

arunvv90 commented Aug 2, 2022

charlesfoster commented Aug 2, 2022

arunvv90 commented Aug 3, 2022

charlesfoster commented Aug 3, 2022

arunvv90 commented Aug 3, 2022

arunvv90 commented Aug 3, 2022