Fix issue 81, "call empty droplets" #301

fmalmeida · 2024-02-02T12:12:06Z

Right now, just opening a draft PR on the attempt of solving issue #81 so that it is easier to keep track of modifications.

One work is "done" I will add a thorough overview of the changes, with explanations of the main modifications and listing on TODOs to be addressed before merging.

Then, of course, only then I will add some reviewers.

tested only with kallisto aligner (both with and without automated kallisto filtering with bustools --filter parameter)

github-actions · 2024-02-02T12:12:24Z

Python linting (`black`) is failing

To keep the code consistent with lots of contributors, we run automated code consistency checks.
To fix this CI test, please run:

Install black: pip install black
Fix formatting errors in your pipeline: black .

Once you push these changes the test should pass, and you can hide this comment 👍

We highly recommend setting up Black in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!

Thanks again for your contribution!

fmalmeida · 2024-02-02T12:13:14Z

modules/nf-core/kallistobustools/count/main.nf

+    tuple val(meta), path ("*.count/counts_unfiltered"), emit: raw_counts                       // TODO: Add to nf-coew/modules before merging PR
+    tuple val(meta), path ("*.count/counts_filtered")  , emit: filtered_counts, optional: true  // TODO: Add to nf-coew/modules before merging PR


Here I am aware that this modification must go to nf-core/modules and not here, thus I added the TODO so this removed once testing is done.

github-actions · 2024-02-02T12:13:43Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 2406600

+| ✅ 171 tests passed       |+
#| ❔   4 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

files_exist - File is ignored: lib/Utils.groovy
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
template_strings - template_strings
schema_params - schema_params

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-scrnaseq_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-scrnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-scrnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-scrnaseq_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/WorkflowScrnaseq.groovy
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 2.6.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.aligner= alevin
nextflow_config - Config default value correct: params.protocol= auto
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.simpleaf_rlen= 91
nextflow_config - Config default value correct: params.star_feature= Gene
nextflow_config - Config default value correct: params.kb_workflow= standard
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-scrnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scrnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-scrnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: linting.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' contains report_section_order
multiqc_config - 'assets/multiqc_config.yml' contains export_plots
multiqc_config - 'assets/multiqc_config.yml' contains report_comment
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.13.1
Run at 2024-03-18 16:06:48

…(tested with lamanno)

…l-empty-droplets

…d in the testings (cellranger)

fmalmeida · 2024-02-15T08:33:03Z

Hi @grst and @apeltzer ,
This PR is now ready for review and discussions.
As promised, I am going to add a description of changes and open things.

The empty-drops module

First of all, I have added a script to perform the empty-drops calling and filtering using a library that is available in bioconda, bioconductor-dropletutils.

With that, I added a module which is simple, it takes a matrix file, and performs the empty drops call on it, generating another matrix file.

Note: The module fails on small datasets or whenever there is not sufficient data points for filtering. Thus, it must work on "raw/unprocessed" matrices and should be skipped (--skip_emptydrops=true) for small datasets.

The inclusion in the workflow

With the module generated, I could then include it in the workflow.

    // Run emptydrops calling module
    if ( !params.skip_emptydrops ) {

        //
        // emptydrops should only run on the raw matrices thus, filter-out the filtered result of the aligners that can produce it
        //
        if ( params.aligner in [ 'cellranger', 'cellrangerarc', 'kallisto', 'star' ] ) {
            ch_mtx_matrices_for_emptydrops =
                ch_mtx_matrices.filter { meta, mtx_files ->
                    mtx_files.toString().contains("raw_feature_bc_matrix") || // cellranger
                    mtx_files.toString().contains("counts_unfiltered")  || // kallisto
                    mtx_files.toString().contains("raw")                   // star
            }
        } else {
            ch_mtx_matrices_for_emptydrops = ch_mtx_matrices
        }
        EMPTYDROPS_CELL_CALLING( ch_mtx_matrices_for_emptydrops )
        ch_mtx_matrices = ch_mtx_matrices.mix( EMPTYDROPS_CELL_CALLING.out.filtered_matrices )
    }

One thing to note above is that, as discussed previously, I have to add a checker/filter in order to only pass on the raw/unprocessed matrices generated by the assemblers, because, I think it does not make sense to run the module in the already filtered/processed matrices.

Or do you think we should invert, only running in the processed ones?

Changes in conversion modules

Because now we will have both the data directly from the aligners, and a custom-made filtering module, I had to change the conversion modules a bit so they are aware of that, and can understand the difference between raw/filtered from the aligners itself and what is the custom empty drops filter.

With that, to try to avoid confusion by the user, I had to add such "suffixes" to the generated converted files, so now we write data with such sufixes:

*_{raw,filtered,custom_emptydrops_filter}_matrix.{h5ad,rds}

In this case, the meanings are:

suffix	meaning
raw	Conversion of the raw/unprocessed matrix generated by the tool. It is also used for tools that generate only one matrix, such as alevin.
filtered	Conversion of the filtered/processed matrix generated by the tool
custom_emptydrops_filter custom_emptydrops_filter	Conversion of the matrix that was generated by the new custom empty drops filter module

I also had to update the two conversion-scripts for rds and h5ad, because now they also have to understand this new matrix generated, and to allow it to also transpose the matrices from cellranger, because, when we do the normal conversion of cellranger matrices, we use the .h5 files it generates, so it is very fine, but the cellranger-emptydrops filtering does not produce such file and then the conversion scripts need to understand it.

Integrating with the aligners

Finally, it has to be discussed the changes made in the aligners-connections to integrate this module. One by one.

Alevin

Because alevin only produce on matrix result, without producing a pair (raw&filtered) as others like cellranger, star and kallisto, the integration was seamless and required no major change, just connecting it to the module.

When using it, the results generated are the following:

alevin_run/
├── alevin
│   ├── mtx_conversions
│   │   ├── combined_custom_emptydrops_filter_matrix.h5ad
│   │   ├── combined_raw_matrix.h5ad
│   │   ├── pbmc8k
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
│   │   │   ├── pbmc8k_raw_matrix.h5ad
│   │   │   └── pbmc8k_raw_matrix.rds
│   │   └── versions.yml
│   ├── pbmc8k
│   │   └── emptydrops_filtered
│   │       ├── quants_mat_cols.txt
│   │       ├── quants_mat.mtx
│   │       └── quants_mat_rows.txt
│   ├── pbmc8k_alevin_results
│   │   ├── af_map
│   │   │   ├── alevin
│   │   │   ├── aux_info
│   │   │   ├── cmd_info.json
│   │   │   ├── libParams
│   │   │   ├── logs
│   │   │   ├── map.rad
│   │   │   └── unmapped_bc_count.bin
│   │   ├── af_quant
│   │   │   ├── alevin
│   │   │   ├── all_freq.bin
│   │   │   ├── collate.json
│   │   │   ├── featureDump.txt
│   │   │   ├── generate_permit_list.json
│   │   │   ├── map.collated.rad
│   │   │   ├── permit_freq.bin
│   │   │   ├── permit_map.bin
│   │   │   ├── quant.json
│   │   │   └── unmapped_bc_count_collated.bin
│   │   └── simpleaf_quant_log.json
├── alevinqc
├── fastqc
├── multiqc
└── pipeline_info

Kallisto

For kallisto, I first had to include a new parameter in the pipeline, called, --kb_filter because it is possible that kallisto produce only one matrix or a pair (raw/filtered) depending on whether the bustools --filter command is used or not. So, I added this new parameter to make sure I could include the module taking account of both scenarios.

Finally, to make sure I had a channel that could be properly used, and of course, that we could filter raw / filtered data to choose what to pass on to the empty drops filter module, I had to modify the generated channels by the module (see here).

tuple val(meta), path ("*.count")                  , emit: count
tuple val(meta), path ("*.count/counts_unfiltered"), emit: raw_counts                       // TODO: Add to nf-coew/modules before merging PR
tuple val(meta), path ("*.count/counts_filtered")  , emit: filtered_counts, optional: true  // TODO: Add to nf-coew/modules before merging PR

Then, of course, I updated the downstream snippets of the codes in the suf-workflows and workflow to understand it.

I know these changes shall got to nf-core/modules first. I added here first for discussion, when all solved, I can add to nf-core/modules whatever is needed.

Results look like this, when --filter is on.

kallisto_lamanno_run
├── fastqc
├── kallisto
│   ├── mtx_conversions
│   │   ├── combined_custom_emptydrops_filter_matrix.h5ad
│   │   ├── combined_filtered_matrix.h5ad
│   │   ├── combined_raw_matrix.h5ad
│   │   ├── pbmc8k
│   │   │   ├── pbmc8k_spliced_matrix.h5ad
│   │   │   ├── pbmc8k_spliced_matrix.rds
│   │   │   ├── pbmc8k_unspliced_matrix.h5ad
│   │   │   └── pbmc8k_unspliced_matrix.rds
│   │   └── versions.yml
│   ├── pbmc8k.count
│   │   ├── 10x_version2_whitelist.txt
│   │   ├── counts_filtered
│   │   │   ├── spliced.barcodes.txt
│   │   │   ├── spliced.genes.txt
│   │   │   ├── spliced.mtx
│   │   │   ├── unspliced.barcodes.txt
│   │   │   ├── unspliced.genes.txt
│   │   │   └── unspliced.mtx
│   │   ├── counts_unfiltered
│   │   │   ├── spliced.barcodes.txt
│   │   │   ├── spliced.genes.txt
│   │   │   ├── spliced.mtx
│   │   │   ├── unspliced.barcodes.txt
│   │   │   ├── unspliced.genes.txt
│   │   │   └── unspliced.mtx
│   │   ├── emptydrops_filtered
│   │   │   ├── spliced.barcodes.txt
│   │   │   ├── spliced.genes.txt
│   │   │   ├── spliced.mtx
│   │   │   ├── unspliced.barcodes.txt
│   │   │   ├── unspliced.genes.txt
│   │   │   └── unspliced.mtx
│   │   ├── filter_barcodes.txt
│   │   ├── inspect.json
│   │   ├── inspect.spliced.json
│   │   ├── inspect.unspliced.json
│   │   ├── kb_info.json
│   │   ├── matrix.ec
│   │   ├── output.bus
│   │   ├── output.filtered.bus
│   │   ├── output.unfiltered.bus
│   │   ├── run_info.json
│   │   ├── spliced.filtered.bus
│   │   ├── spliced.unfiltered.bus
│   │   ├── transcripts.txt
│   │   ├── unspliced.filtered.bus
│   │   └── unspliced.unfiltered.bus
│   └── versions.yml
├── multiqc
└── pipeline_info

kallisto_run
├── fastqc
├── kallisto
│   ├── mtx_conversions
│   │   ├── combined_custom_emptydrops_filter_matrix.h5ad
│   │   ├── combined_raw_matrix.h5ad
│   │   ├── pbmc8k
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
│   │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
│   │   │   ├── pbmc8k_raw_matrix.h5ad
│   │   │   └── pbmc8k_raw_matrix.rds
│   │   └── versions.yml
│   ├── pbmc8k.count
│   │   ├── 10x_version2_whitelist.txt
│   │   ├── counts_unfiltered
│   │   │   ├── cells_x_genes.barcodes.txt
│   │   │   ├── cells_x_genes.genes.txt
│   │   │   └── cells_x_genes.mtx
│   │   ├── emptydrops_filtered
│   │   │   ├── cells_x_genes.barcodes.txt
│   │   │   ├── cells_x_genes.genes.txt
│   │   │   └── cells_x_genes.mtx
│   │   ├── inspect.json
│   │   ├── kb_info.json
│   │   ├── matrix.ec
│   │   ├── output.bus
│   │   ├── output.unfiltered.bus
│   │   ├── run_info.json
│   │   └── transcripts.txt
│   └── versions.yml
├── multiqc
└── pipeline_info

Cellranger

For cellranger, basically it happened the same to Kallisto. The difference is that cellranger always produces a pair of raw/filtered and then I had to just modify the channels to account for that so would make filtering later on easier (see here)

I also had to update the conversion-scripts because the emptydrops filter module does not produce a .h5 file so the scripts where not ready for converting .mtx cellranger files.

Results for it looks like this:

cellranger_run/
├── cellranger
│   ├── count
│   │   ├── pbmc8k
│   │   │   ├── emptydrops_filtered
│   │   │   └── outs
│   │   └── versions.yml
│   ├── mkgtf
│   │   └── genome_genes.filtered.gtf
│   ├── mkref
│   │   ├── cellranger_reference
│   │   │   ├── fasta
│   │   │   ├── genes
│   │   │   ├── reference.json
│   │   │   └── star
│   │   └── versions.yml
│   └── mtx_conversions
│       ├── combined_custom_emptydrops_filter_matrix.h5ad
│       ├── combined_filtered_matrix.h5ad
│       ├── combined_raw_matrix.h5ad
│       ├── pbmc8k
│       │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
│       │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
│       │   ├── pbmc8k_filtered_matrix.h5ad
│       │   ├── pbmc8k_filtered_matrix.rds
│       │   ├── pbmc8k_raw_matrix.h5ad
│       │   └── pbmc8k_raw_matrix.rds
│       └── versions.yml
├── fastqc
├── multiqc
└── pipeline_info

STAR

For star, basically the same for cellranger. It always produce a raw/filtered pair, but I had to adjust the out-channels to make the filtering/selection easier. Of course, adjusting all the downstream channel selections to account for them.

The results for it look like this:

star_run/
├── fastqc
├── multiqc
├── pipeline_info
└── star
    ├── mtx_conversions
    │   ├── combined_custom_emptydrops_filter_matrix.h5ad
    │   ├── combined_filtered_matrix.h5ad
    │   ├── combined_raw_matrix.h5ad
    │   ├── pbmc8k
    │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.h5ad
    │   │   ├── pbmc8k_custom_emptydrops_filter_matrix.rds
    │   │   ├── pbmc8k_filtered_matrix.h5ad
    │   │   ├── pbmc8k_filtered_matrix.rds
    │   │   ├── pbmc8k_raw_matrix.h5ad
    │   │   └── pbmc8k_raw_matrix.rds
    │   └── versions.yml
    └── pbmc8k
        ├── emptydrops_filtered
        │   ├── barcodes.tsv
        │   ├── features.tsv
        │   └── matrix.mtx
        ├── pbmc8k.Aligned.sortedByCoord.out.bam
        ├── pbmc8k.Log.final.out
        ├── pbmc8k.Log.out
        ├── pbmc8k.Log.progress.out
        ├── pbmc8k.SJ.out.tab
        ├── pbmc8k.Solo.out
        │   ├── Barcodes.stats
        │   └── Gene
        └── versions.yml

UniverSC and cellrangerarc

I could not even run them, so could not be tested nor integrated.
Thus, I believe they first need to be solved in order to have a proper testing profile to allow integrating these modules here.

Just not sure what should go first.

About the out-channels

You will see in the scrnaseq.nf workflow, that even after generating a proper channel for raw/filtered in the modules of cellranger, star and kallisto I still mix them all in the general ch_mtx_matrices channel.

I do this to guarantee all results are going to the downstream analysis.
The reason I modified the modules to generate these split channels, was just to make sure they grabbed the folder which contains only the raw/filtered results to allow a proper filtering.
Before they were using a higher level ** selection and then all files were being dumped in the channel in a flatten mode, so the filtering was very complicate and resulting in files with duplicated names.

You see that I had to modify the conversion subworkflow to account for that as well.

apeltzer

Looks good to me apart from some parts:

Changes in the module code --> need to go upstream, ideally open PRs already for this
Upgrades in the respective upstream modules might be necessary, check out the cellranger update PR, which could be merged prior to updating the modules here I believe --> should be easy to do..
Some more docs on what this sfeature does would be helpful - so that people can both see it in the changelog and also in the main documentation

fmalmeida · 2024-03-01T11:07:30Z

Looks good to me apart from some parts:

Changes in the module code --> need to go upstream, ideally open PRs already for this

Upgrades in the respective upstream modules might be necessary, check out the cellranger update PR, which could be merged prior to updating the modules here I believe --> should be easy to do..

Some more docs on what this sfeature does would be helpful - so that people can both see it in the changelog and also in the main documentation

Yes, the changes in the modules I added as a TODO so that, once we know and agree on all changes required, it can be done in nf-core/modules.

About the docs, I will work on it.

bin/emptydrops_cell_calling.R

modules/local/emptydrops.nf

…l-empty-droplets

grst · 2024-03-11T13:08:07Z

sounds good

grst · 2024-03-13T07:25:57Z

@nf-core-bot fix linting

fmalmeida · 2024-03-13T07:32:24Z

Now that kallisto was updated, and the workflows it provides are different. I will have to test it with them as well.
Once they work, the last thing required will be the docs and the PRs to modules.

fmalmeida · 2024-03-13T11:13:40Z

Hi @grst ,

Can you take a look again at the changes?

Updated documentation in docs/usage.md
Opened PR for cellranger/count module: add paths in output directive in cellranger cout module modules#5108
Opened PR for kallistobustools/count module: https://github.com/nf-core/modules/pull/5110/files
Run went okay with all aligners, including different kb_workflows.

The only one missing is the last one, which is currently running.

grst · 2024-03-13T12:15:23Z

I'm wondering if instead of updating all those modules it would be easier to do something like

ch_filtered = ch_out.map{
        meta, files -> [meta, out.findAll{ it -> it.contains("filtered") }]
}

fmalmeida · 2024-03-13T12:19:18Z

I'm wondering if instead of updating all those modules it would be easier to do something like
ch_filtered = ch_out.map{
        meta, files -> [meta, out.findAll{ it -> it.contains("filtered") }]
}

That was my first try. But because many of them use the ** to grab the files. It catches the files and not the directories. So, when filtering, we only select the files that have it in its name instead of the directory and all that is inside.

I added some information in the last section of this comment #301 (comment)

…& nac) workflows

grst · 2024-03-13T12:55:48Z

I see, fair enough then

…rs to avoid file collision

grst

Good to go as soon as the tests pass!

fmalmeida · 2024-03-18T16:00:13Z

Pipeline execution terminated.

As said here, the nf-core modules were updated and TODOs removed. Documentation was updated. And also, workflow (kallisto) was updated so that the new modules works for both non-standard kallisto workflows, lamanno and nac.

Results structure organization and namings are being produced as said here.

Finally, all testings passed, so, merging the PR 😄

fmalmeida added 3 commits January 31, 2024 11:42

minimum inclusion for module

6d81894

tested only with kallisto aligner (both with and without automated kallisto filtering with bustools --filter parameter)

update input_type labels

afe4904

fixed workflow conversions to work with star align results

d03d81f

fmalmeida added the enhancement New feature or request label Feb 2, 2024

fmalmeida self-assigned this Feb 2, 2024

fmalmeida commented Feb 2, 2024

View reviewed changes

fmalmeida and others added 13 commits February 5, 2024 10:02

include modifications for working with cellranger

796afba

fix the path of matrices when running non-standard kallisto workflow …

c9c38ea

…(tested with lamanno)

fix spliced/unspliced empty_drops conversion

a9ee3f1

solve number of list levels when having spliced / unspliced

0cb2a2f

Merge branch 'dev' of https://github.com/nf-core/scrnaseq into 81-cal…

8a2e818

…l-empty-droplets

update shared nf-test config

6458e2d

update alevin file names

c5cc1ca

update alevin tests to also include the .rds files

c539399

update the number of tasks that are executed, and include raw/filtere…

2d7c90e

…d in the testings (cellranger)

fix naming of generated files (kallisto)

c72c8f2

update the amount of tasks and generated file names raw/filtered (star)

5d1d783

update gitignore

4095c3d

add new params to schema

6981c44

fmalmeida marked this pull request as ready for review February 15, 2024 08:33

fmalmeida requested review from grst and apeltzer February 15, 2024 08:33

fmalmeida added the Ready for review label Feb 15, 2024

fmalmeida linked an issue Feb 29, 2024 that may be closed by this pull request

Call empty droplets #81

Closed

apeltzer approved these changes Mar 1, 2024

View reviewed changes

grst reviewed Mar 5, 2024

View reviewed changes

bin/emptydrops_cell_calling.R Outdated Show resolved Hide resolved

bin/emptydrops_cell_calling.R Outdated Show resolved Hide resolved

modules/local/emptydrops.nf Outdated Show resolved Hide resolved

modules/local/emptydrops.nf Outdated Show resolved Hide resolved

Merge branch 'dev' of https://github.com/nf-core/scrnaseq into 81-cal…

618c721

…l-empty-droplets

fmalmeida and others added 4 commits March 12, 2024 09:02

fixed problem on loading fasta&gtf from params.genome

492598b

fixed transposition

6207655

fixed file used

8c5702b

Merge branch 'dev' into 81-call-empty-droplets

7c8ed95

updating documentation

3adeb65

remove unused parameter

48dd996

adjust modules to handle kallisto outputs form non-standard (lamanno …

9cd8edd

…& nac) workflows

when running kallisto non-standard workflow store emptydrops in subdi…

d77eda8

…rs to avoid file collision

grst approved these changes Mar 14, 2024

View reviewed changes

fmalmeida and others added 4 commits March 18, 2024 14:07

update modules to get them from nf-core/modules

c460f28

prettier fix

1c1f1ac

small update as file names changed

6351f76

add ending line

9734ace

update changelog

2406600

fmalmeida enabled auto-merge March 18, 2024 16:05

fmalmeida merged commit 1043441 into dev Mar 18, 2024
11 checks passed

fmalmeida deleted the 81-call-empty-droplets branch March 18, 2024 16:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 81, "call empty droplets" #301

Fix issue 81, "call empty droplets" #301

fmalmeida commented Feb 2, 2024

github-actions bot commented Feb 2, 2024

fmalmeida Feb 2, 2024

github-actions bot commented Feb 2, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

fmalmeida commented Feb 15, 2024 •

edited

Loading

apeltzer left a comment

fmalmeida commented Mar 1, 2024

grst commented Mar 11, 2024

grst commented Mar 13, 2024

fmalmeida commented Mar 13, 2024

fmalmeida commented Mar 13, 2024 •

edited

Loading

grst commented Mar 13, 2024

fmalmeida commented Mar 13, 2024

grst commented Mar 13, 2024

grst left a comment

fmalmeida commented Mar 18, 2024

		tuple val(meta), path ("*.count/counts_unfiltered"), emit: raw_counts // TODO: Add to nf-coew/modules before merging PR
		tuple val(meta), path ("*.count/counts_filtered") , emit: filtered_counts, optional: true // TODO: Add to nf-coew/modules before merging PR

Fix issue 81, "call empty droplets" #301

Fix issue 81, "call empty droplets" #301

Conversation

fmalmeida commented Feb 2, 2024

github-actions bot commented Feb 2, 2024

Python linting (black) is failing

fmalmeida Feb 2, 2024

Choose a reason for hiding this comment

github-actions bot commented Feb 2, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

fmalmeida commented Feb 15, 2024 • edited Loading

The empty-drops module

The inclusion in the workflow

Changes in conversion modules

Integrating with the aligners

Alevin

Kallisto

Cellranger

STAR

UniverSC and cellrangerarc

About the out-channels

apeltzer left a comment

Choose a reason for hiding this comment

fmalmeida commented Mar 1, 2024

grst commented Mar 11, 2024

grst commented Mar 13, 2024

fmalmeida commented Mar 13, 2024

fmalmeida commented Mar 13, 2024 • edited Loading

grst commented Mar 13, 2024

fmalmeida commented Mar 13, 2024

grst commented Mar 13, 2024

grst left a comment

Choose a reason for hiding this comment

fmalmeida commented Mar 18, 2024

Python linting (`black`) is failing

github-actions bot commented Feb 2, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

fmalmeida commented Feb 15, 2024 •

edited

Loading

fmalmeida commented Mar 13, 2024 •

edited

Loading