structure #103

brentp · 2022-06-10T08:01:04Z

This is a suggestion for structuring the code. Currently, it's very focused on the evaluations. Let's make the user-facing code and build the evaluations and training around that.

extract-signals

This takes the BAM/CRAM file and extracts all relevant signals. This is working and simple to use (if a bit slow with thousands of contigs).

generate channels

This takes the output from extract signals and a set of SVs and generates the arrays (channels) to be used by the NN.

this should be updated to accept VCF (currently requires bedpe)

score (predict)

This should take a trained model along with a VCF or bedpe and output a score for each variant in the sample field. With an option for QUAL.

this should be updated to accept VCF (currently requires bedpe)
this should NOT accept labels, that is part of train/evaluate.

train/evaluate

This will be handled by Luca and includes the optimization and LOCSO. We will keep this more isolated since it is harder to run.
Simplify to only and always use LOCSO.

Find models that tend to work well to reduce search space of optimizer and reduce variability among runs. Currently, when running LOCSO for different chromosomes we can get dramatically different results because of the network architecture or hyperparameters.
Use more true negative variants in training. This can help prevent over-fitting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

structure #103

structure #103

brentp commented Jun 10, 2022 •

edited

Loading

structure #103

structure #103

Comments

brentp commented Jun 10, 2022 • edited Loading

extract-signals

generate channels

score (predict)

train/evaluate

brentp commented Jun 10, 2022 •

edited

Loading