Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow len file to be supplied for bedToBigWig converter builtin #18356

Closed
wants to merge 4 commits into from

Conversation

fubar2
Copy link
Member

@fubar2 fubar2 commented Jun 8, 2024

Submitting as a draft for comment: Should this converter be moved out of the lib/galaxy/datatypes/converters built-in tools into an IUC tool?

In the long run, if Galaxy is to become a generic platform for all kinds of open science, all the built in genomic metadata and converters will eventually become pluggable, like everything else?

Currently the tool works if the reference has a built-in or custom genome dbkey.

Use case is VGP workflows where the reference assembly is constantly being improved until it is finished. A custom genome could work but would need to be manually revised for every assembly update. Possible that this would always be done manually, but fragile and likely to lead to failures from a missing contig in a new assembly so this PR allows a chromosome length file to be supplied. Test added and dependencies updated to latest versions

This way can generate the len file and be confident that the contigs and lengths are correct at each workflow run.

How to test the changes?

(Select all options that apply)

  • This is a refactoring of components with existing test coverage.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

fubar2 added 2 commits June 8, 2024 10:06
Test added and dependencies updated to latest versions
Use case is VGP assemblies in workflows where the assembly is constantly being improved, so a custom genome would need to be revised for every assembly update.
Possible but likely to lead to failures.

This way can extract the lengths and be sure they are correct at each workflow run.
How to get hg17.len for the chromfile?
Trying with the old dependency versions
@fubar2
Copy link
Member Author

fubar2 commented Jun 8, 2024

ah. inbuilt tests are tricky. Cannot find hg17.len
Seeing if the old test passes with the old dependencies - it works with the new ones here but only with hg17.len in test-data.

@fubar2
Copy link
Member Author

fubar2 commented Jun 8, 2024

Darn. Need to provide an hg17 len file somehow.
Will close this until I figure out what to do...

@fubar2 fubar2 closed this Jun 8, 2024
@mschatz
Copy link
Member

mschatz commented Jun 10, 2024 via email

@fubar2
Copy link
Member Author

fubar2 commented Jun 10, 2024

I think this is the file you are looking for: https://hgdownload.soe.ucsc.edu/goldenPath/hg17/bigZips/hg17.chrom.sizes Good luck! Mike

On Fri, Jun 7, 2024 at 10:52 PM Ross Lazarus @.> wrote: Darn. Need to provide an hg17 len file somehow. Will close this until I figure out what to do... — Reply to this email directly, view it on GitHub <#18356 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABP34ZLQFF2T2NWO5HOBBDZGJWWDAVCNFSM6AAAAABI7PQLVSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJVG43TMNRXGA . You are receiving this because you are subscribed to this thread.Message ID: @.>
@mschatz
Do remote URI locations work for built-in tool tests?
They are wierd...
But anyway, this PR is probably a bad idea because built-ins do not usually take parameters - but thanks for the idea!

@fubar2
Copy link
Member Author

fubar2 commented Jun 10, 2024

@mschatz
Even funnier. That won't work well because the srma_out2.bam only has "dummy_chr" so it has to be a bogus hg17.len file. These inbuilt converters are weird. Old, untouched, dusty and way behind in terms of dependency updates. Does that matter? Not if they work I suppose - plenty else to do.

samtools view test-data/srma_out2.bam
GA5:3:2:1710:1301#0     0       dummy_chr   

@mvdbeek
Copy link
Member

mvdbeek commented Jun 10, 2024

An external tool is the way to go, but we will have to leave the old converters in place for reproducibility. We've been aiming to reduce the amount of tools shipped in the codebase, as they're not as easy to provide, are often not relevant to non-biology domains and the whole dbkey/genome/build system is something that doesn't work too well when working with new and evolving assemblies.

@fubar2
Copy link
Member Author

fubar2 commented Jun 10, 2024

@mvdbeek: How about a single generic tool to combine the converters for bed/bam/cram/gff - those are the things the VGP will want - to bigwig? They have bed and gff with 6M rows when what they really want (from Arang's work) is the density with some stats.

@mvdbeek
Copy link
Member

mvdbeek commented Jun 10, 2024

I think the tried and true thing is to orient yourself on the upstream command you're using. If as a hypothetical command you've got interval_tools view {whatever} > bigwig and it uses the same input datasets and similar resource requirements then that's fine, but maintainability and scheduling are much simpler with tools that do a single thing. In particular we have a track-record of dealing with conditionals and if/else in cheetah that could be better. This seems to be the number one source of tool bugs.

@fubar2
Copy link
Member Author

fubar2 commented Jun 10, 2024

@mvdbeek: Ok, but remember there's a lot of boilerplate fixing to do for multiple converters to meet IUC standards - those things are old. Was looking at a slight change to <tool id="CONVERTER_bed_gff_or_vcf_to_bigwig_0" name="Convert BED, GFF, or VCF to BigWig" version="1.0.1" hidden="true"> because bam to bed is a line using bedtools. Just one input with ext based preprocessing for ucsc-bedgraphtobigwig and not as complex as might appear... Let me see what I can do.

Update: It only tests bed and that particular converter is probably broken unless GFF now has the same order of fields (chr,start,end,score) as bed/bedgraph? It was never thus AFAIK - hence the script in that converters directory gff_to_bed_converter.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants