This repository has been archived by the owner on Mar 2, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 12
standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz #100
Comments
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
I’d recommend posting your issue there. Sorry for the inconvenience!
… On Sep 23, 2020, at 9:34 AM, liud2 ***@***.***> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
|
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins <[email protected]>
Sent: Friday, September 25, 2020 9:49 AM
To: talkowski-lab/svtk <[email protected]>
Cc: Liu, Delong (NIH/NHLBI) [E] <[email protected]>; Author <[email protected]>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
I’d recommend posting your issue there. Sorry for the inconvenience!
On Sep 23, 2020, at 9:34 AM, liud2 ***@***.******@***.***>> wrote:
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA>.
|
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
… On Sep 25, 2020, at 3:02 PM, liud2 ***@***.***> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins ***@***.***>
Sent: Friday, September 25, 2020 9:49 AM
To: talkowski-lab/svtk ***@***.***>
Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.***>; Author ***@***.***>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
I’d recommend posting your issue there. Sorry for the inconvenience!
> On Sep 23, 2020, at 9:34 AM, liud2 ***@***.******@***.***>> wrote:
>
>
> I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA>.
|
Hi Ryan,
I will consult our IT scientists about if they could install gatk-sv on NIH biowulf cluster, which is no charge for us to use. Including svtk to gatk package is a great move.
Thank you and your team for publishing the Nature paper in 5/2020 and making the gnomAD SV database available to the public!
Delong
From: Ryan L. Collins <[email protected]>
Sent: Saturday, September 26, 2020 8:33 AM
To: talkowski-lab/svtk <[email protected]>
Cc: Liu, Delong (NIH/NHLBI) [E] <[email protected]>; Author <[email protected]>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 ***@***.******@***.***>> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins ***@***.******@***.***>>
Sent: Friday, September 25, 2020 9:49 AM
To: talkowski-lab/svtk ***@***.******@***.***>>
Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.******@***.***>>; Author ***@***.******@***.***>>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
I’d recommend posting your issue there. Sorry for the inconvenience!
> On Sep 23, 2020, at 9:34 AM, liud2 ***@***.******@***.******@***.******@***.***>>> wrote:
>
>
> I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA>.
|
Hi Ryan,
I have to admit that I know little about shell and python programming. I use R on daily basis. I am picking up python and shell programing.
In the stvk 0.1 installed on our clusters, I do not see steps for BAF test, rd-test or variant filtering steps. I do see the python scripts in the directories of https://github.com/talkowski-lab/SV-Adjudicator/tree/master and https://github.com/talkowski-lab/svtk/tree/master/svtk. The directory of SV-Adjudicator contains more details than the svtk directory.
After weeks of struggle, I have completed pe-test and sr-test for vcf files generated from delly, lumpy, manta, and the final vcf merged from SVA, LINE1 and ALU. I was stuck with extracting baf information from vcf files. I have the final calibrated GATK indel-snp.vcf (GATK 3.8.1) from ~ 500 human genomes for each chromosome. I am able to extract GTs of the samples from GATK vcf using bcftools query. I do not know the format of the BAF file required for BAF test.
Could you please point me to the python files in the directory of SV-Adjudicator or svtk to complete the 3 steps:
* BAF calculation and BAF-test
* Rd-test
* Variant filtering using random forest
Given that I may not have access to GATK-sv on our clusters soon, I have to rely on our clusters and svtk 0.1 or the python files in the SV-Adjudicator to complete SV analysis of our data by following the svtk data processing chart on your recent Nature paper.
As I have been walking through our SV data processing, I have fully appreciated the amount of time and efforts you team spent in reliably extracting structural variants from ~15,000 human genomes in your Nature paper. We are going to use gnomAD SV database for our reference later.
I appreciate your help.
Delong Liu
Translational Vascular Medicine Branch
NHLBI/NIH
Building 10, 8N110
10 Center Drive
Bethesda, MD 20887
301-451-3410 (w)
[email protected]
From: Ryan L. Collins <[email protected]>
Sent: Saturday, September 26, 2020 8:33 AM
To: talkowski-lab/svtk <[email protected]>
Cc: Liu, Delong (NIH/NHLBI) [E] <[email protected]>; Author <[email protected]>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
On Sep 25, 2020, at 3:02 PM, liud2 ***@***.******@***.***>> wrote:
Hi Ryan,
Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
Delong Liu
From: Ryan L. Collins ***@***.******@***.***>>
Sent: Friday, September 25, 2020 9:49 AM
To: talkowski-lab/svtk ***@***.******@***.***>>
Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.******@***.***>>; Author ***@***.******@***.***>>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi,
This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
I’d recommend posting your issue there. Sorry for the inconvenience!
> On Sep 23, 2020, at 9:34 AM, liud2 ***@***.******@***.******@***.******@***.***>>> wrote:
>
>
> I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA>.
|
Hi Delong,
I’d recommend referring to the workflow files (.wdl) in the official GATK-SV repo for examples of these commands:
https://github.com/broadinstitute/gatk-sv/tree/master/wdl <https://github.com/broadinstitute/gatk-sv/tree/master/wdl>
For example, the BAF calculation is documented here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl>
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl>
And the RdTest commands are documented here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl>
Or, more generally, all of the variant evidence collection steps (including RdTest, PETest, SRTest, and BAFTest) are wrapped into the Module02 workflow here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl>
And the variant filtering is all executed here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl>
The command blocks within each of these WDLs will give you example commands to execute each of the steps required for these analyses.
I’d also recommend posting issues under the official GATK-SV repo (https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv/tree/master/wdl>) instead of here, since this repo is no longer actively maintained.
Thanks,
Ryan
… On Oct 12, 2020, at 4:45 PM, liud2 ***@***.***> wrote:
Hi Ryan,
I have to admit that I know little about shell and python programming. I use R on daily basis. I am picking up python and shell programing.
In the stvk 0.1 installed on our clusters, I do not see steps for BAF test, rd-test or variant filtering steps. I do see the python scripts in the directories of https://github.com/talkowski-lab/SV-Adjudicator/tree/master and https://github.com/talkowski-lab/svtk/tree/master/svtk. The directory of SV-Adjudicator contains more details than the svtk directory.
After weeks of struggle, I have completed pe-test and sr-test for vcf files generated from delly, lumpy, manta, and the final vcf merged from SVA, LINE1 and ALU. I was stuck with extracting baf information from vcf files. I have the final calibrated GATK indel-snp.vcf (GATK 3.8.1) from ~ 500 human genomes for each chromosome. I am able to extract GTs of the samples from GATK vcf using bcftools query. I do not know the format of the BAF file required for BAF test.
Could you please point me to the python files in the directory of SV-Adjudicator or svtk to complete the 3 steps:
* BAF calculation and BAF-test
* Rd-test
* Variant filtering using random forest
Given that I may not have access to GATK-sv on our clusters soon, I have to rely on our clusters and svtk 0.1 or the python files in the SV-Adjudicator to complete SV analysis of our data by following the svtk data processing chart on your recent Nature paper.
As I have been walking through our SV data processing, I have fully appreciated the amount of time and efforts you team spent in reliably extracting structural variants from ~15,000 human genomes in your Nature paper. We are going to use gnomAD SV database for our reference later.
I appreciate your help.
Delong Liu
Translational Vascular Medicine Branch
NHLBI/NIH
Building 10, 8N110
10 Center Drive
Bethesda, MD 20887
301-451-3410 (w)
***@***.***
From: Ryan L. Collins ***@***.***>
Sent: Saturday, September 26, 2020 8:33 AM
To: talkowski-lab/svtk ***@***.***>
Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.***>; Author ***@***.***>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
> On Sep 25, 2020, at 3:02 PM, liud2 ***@***.******@***.***>> wrote:
>
>
> Hi Ryan,
>
> Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
>
> I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
>
> Delong Liu
>
> From: Ryan L. Collins ***@***.******@***.***>>
> Sent: Friday, September 25, 2020 9:49 AM
> To: talkowski-lab/svtk ***@***.******@***.***>>
> Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.******@***.***>>; Author ***@***.******@***.***>>
> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
>
> Hi,
>
> This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
> https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
>
> I’d recommend posting your issue there. Sorry for the inconvenience!
>
>
> > On Sep 23, 2020, at 9:34 AM, liud2 ***@***.******@***.******@***.******@***.***>>> wrote:
> >
> >
> > I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
> >
>
>
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA>.
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRAUT6O4GO2DQ3OC3KLSKNTHTANCNFSM4RXBP3CA>.
|
Hi Ryan,
Thanks for providing the websites. Do you know how much it would cost for allocating 1 GB RAM per minute on firecloud?
Thanks,
Delong Liu
From: Ryan L. Collins <[email protected]>
Sent: Monday, October 19, 2020 12:58 PM
To: talkowski-lab/svtk <[email protected]>
Cc: Liu, Delong (NIH/NHLBI) [E] <[email protected]>; Author <[email protected]>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
I’d recommend referring to the workflow files (.wdl) in the official GATK-SV repo for examples of these commands:
https://github.com/broadinstitute/gatk-sv/tree/master/wdl <https://github.com/broadinstitute/gatk-sv/tree/master/wdl>
For example, the BAF calculation is documented here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromGVCFs.wdl>
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/BAFFromShardedVCF.wdl>
And the RdTest commands are documented here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/RDTest.wdl>
Or, more generally, all of the variant evidence collection steps (including RdTest, PETest, SRTest, and BAFTest) are wrapped into the Module02 workflow here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module02.wdl>
And the variant filtering is all executed here:
https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl <https://github.com/broadinstitute/gatk-sv/blob/master/wdl/Module03.wdl>
The command blocks within each of these WDLs will give you example commands to execute each of the steps required for these analyses.
I’d also recommend posting issues under the official GATK-SV repo (https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv/tree/master/wdl>) instead of here, since this repo is no longer actively maintained.
Thanks,
Ryan
On Oct 12, 2020, at 4:45 PM, liud2 ***@***.******@***.***>> wrote:
Hi Ryan,
I have to admit that I know little about shell and python programming. I use R on daily basis. I am picking up python and shell programing.
In the stvk 0.1 installed on our clusters, I do not see steps for BAF test, rd-test or variant filtering steps. I do see the python scripts in the directories of https://github.com/talkowski-lab/SV-Adjudicator/tree/master and https://github.com/talkowski-lab/svtk/tree/master/svtk. The directory of SV-Adjudicator contains more details than the svtk directory.
After weeks of struggle, I have completed pe-test and sr-test for vcf files generated from delly, lumpy, manta, and the final vcf merged from SVA, LINE1 and ALU. I was stuck with extracting baf information from vcf files. I have the final calibrated GATK indel-snp.vcf (GATK 3.8.1) from ~ 500 human genomes for each chromosome. I am able to extract GTs of the samples from GATK vcf using bcftools query. I do not know the format of the BAF file required for BAF test.
Could you please point me to the python files in the directory of SV-Adjudicator or svtk to complete the 3 steps:
* BAF calculation and BAF-test
* Rd-test
* Variant filtering using random forest
Given that I may not have access to GATK-sv on our clusters soon, I have to rely on our clusters and svtk 0.1 or the python files in the SV-Adjudicator to complete SV analysis of our data by following the svtk data processing chart on your recent Nature paper.
As I have been walking through our SV data processing, I have fully appreciated the amount of time and efforts you team spent in reliably extracting structural variants from ~15,000 human genomes in your Nature paper. We are going to use gnomAD SV database for our reference later.
I appreciate your help.
Delong Liu
Translational Vascular Medicine Branch
NHLBI/NIH
Building 10, 8N110
10 Center Drive
Bethesda, MD 20887
301-451-3410 (w)
***@***.******@***.***>
From: Ryan L. Collins ***@***.******@***.***>>
Sent: Saturday, September 26, 2020 8:33 AM
To: talkowski-lab/svtk ***@***.******@***.***>>
Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.******@***.***>>; Author ***@***.******@***.***>>
Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
Hi Delong,
Thanks for your interest in running GATK-SV!
GATK-SV was designed as a cloud-based pipeline, so we typically haven’t encouraged deploying GATK-SV on local computing clusters.
That said, it’s not impossible to run on a local cluster, but you will need to have a running instance of Cromwell and the appropriate permissions to use Docker (many institutional clusters don’t allow this).
Regarding sr-test, pe-test, rd-test, and baf-test: no, all four are still included as part of GATK-SV during module 02, so those still need to be run.
Hope this helps,
Ryan
> On Sep 25, 2020, at 3:02 PM, liud2 ***@***.******@***.******@***.******@***.***>>> wrote:
>
>
> Hi Ryan,
>
> Thanks for informing me the new repo. Do you know whether gatk-sv can be run on a computing cluster? I have just completed vcfcluster step for the vcf files from Manta, Delly and Lumpy with the svtk.
>
> I took a quick look the gatk-sv data flow chart, it seems that the sr-test, pe-test, rd-test, and baf-test proposed in your Nature paper are excluded from the new gatk-sv chart, is it correct?
>
> Delong Liu
>
> From: Ryan L. Collins ***@***.******@***.******@***.******@***.***>>>
> Sent: Friday, September 25, 2020 9:49 AM
> To: talkowski-lab/svtk ***@***.******@***.******@***.******@***.***>>>
> Cc: Liu, Delong (NIH/NHLBI) [E] ***@***.******@***.******@***.******@***.***>>>; Author ***@***.******@***.******@***.******@***.***>>>
> Subject: Re: [talkowski-lab/svtk] standardized vcf files from Manta and Lumpy can not be indexed with tabix -p vcf *.vcf.gz (#100)
>
> Hi,
>
> This repo is no longer being actively maintained. SVTK is now being maintained as part of the following repo:
> https://github.com/broadinstitute/gatk-sv <https://github.com/broadinstitute/gatk-sv>
>
> I’d recommend posting your issue there. Sorry for the inconvenience!
>
>
> > On Sep 23, 2020, at 9:34 AM, liud2 ***@***.******@***.******@***.******@***.******@***.******@***.******@***.******@***.***>>>> wrote:
> >
> >
> > I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
> >
> > —
> > You are receiving this because you are subscribed to this thread.
> > Reply to this email directly, view it on GitHub <#100>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRBKOKW4GPK6UG6MU4DSHH2PTANCNFSM4RXBP3CA>.
> >
>
>
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS7OL4WF7PHH7YHKE5DSHSNV5ANCNFSM4RXBP3CA>.
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRGIUFUONFYSWDDD4PLSHTSKZANCNFSM4RXBP3CA>.
>
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS6CG3EC3A7PJIEGW3TSHXNRRANCNFSM4RXBP3CA>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#100 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4MDRAUT6O4GO2DQ3OC3KLSKNTHTANCNFSM4RXBP3CA>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#100 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADLYYS344RO5B6H4XLIG6IDSLRVZNANCNFSM4RXBP3CA>.
|
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I was able to index the standardized vcf files from Delly using tabix -p vcf, but not able to do so for the vcf files from Manta and Lumpy. The error from tabix points to a coordinate in column 5 of vcf file, which means that the option tabix -p vcf does not work with the *.vcf.gz file from Manta or Lumpy, but Delly. Any suggestions on this issue?
The text was updated successfully, but these errors were encountered: