From 6c29271183a77f822baa9bc6ae5a6c4b7c6bce1b Mon Sep 17 00:00:00 2001 From: Max Date: Tue, 5 Nov 2024 15:10:34 +0100 Subject: [PATCH 01/10] chore: conventional commit action (#134) * chore: Update development (#128) * docs: enhancing documentation * docs: better quickstart * chore: ubdate github actions to setup-micromamba * docs: remove default channel from environment file * docs: improvements, like QC report (#125) * added .DS_Store to gitignore. * Fixed the overflow of the features section by using the table. * Fixed the broked report link. * fixed typo project * Typo fix controlled * Sample QC report HTML file * Added the link to the QC report in experiment. * Added the assignment QC report. * Add link to QC report in assignment documentation * Update documentation in quickstart.rst. Fixed typos and gramatical mistakes. * Update documentation in index.rst. Fix typos and grammatical mistakes. * Fix typo in installation documentation * Refactor documentation in config.rst --------- Co-authored-by: Max * docs: Fixed the link for the QC report in Experiment and Assignment (#126) * added .DS_Store to gitignore. * Fixed the overflow of the features section by using the table. * Fixed the broked report link. * fixed typo project * Typo fix controlled * Sample QC report HTML file * Added the link to the QC report in experiment. * Added the assignment QC report. * Add link to QC report in assignment documentation * Update documentation in quickstart.rst. Fixed typos and gramatical mistakes. * Update documentation in index.rst. Fix typos and grammatical mistakes. * Fix typo in installation documentation * Refactor documentation in config.rst * Update documentation links in assignment.rst and experiment.rst * Testing the iframe html file. * Update documentation links in assignment.rst and experiment.rst --------- Co-authored-by: Max * chore: delete not necessary files * docs: automatic versioning * style: automatic version printing of MPRAsnakeflow * fix: memory resources for bbmap (#123) * fix: add memory resources for bbmap * set lower memm in bbmap workflow profile * increasing memory for bmap --------- Co-authored-by: Max Schubach Co-authored-by: Max Schubach * fix: Detach from anaconda (#122) * fix: detach from anaconda. Remove defaults conda channels * fixing linting errors * update hashes in dockerfile from lining errors --------- Co-authored-by: Max Schubach * chore(master): release MPRAsnakeflow 0.1.1 (#124) * chore(master): release MPRAsnakeflow 0.1.1 * Update .release-please-manifest.json * Update version.txt * Update CHANGELOG.md --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Max * forgot to upgrade two envs * docs: correct link in docs badge --------- Co-authored-by: Max Schubach Co-authored-by: Ali <69039717+bioinformaticsguy@users.noreply.github.com> Co-authored-by: Max Schubach Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * feat!: igvf outputs (#129) * refactor: removed statistics from final barcode to oligo map * refactor outputs * fix scripts due to renaming headers * fix assignment statistic due to new output * refactor!: moving files. not attched counts are not used as well as median for scaling * adding logs --------- Co-authored-by: Max Schubach * chore!: supporting only snakemake >=8.24.1 (#130) Co-authored-by: Max Schubach * refactor!: No min max length for bbmap. default mapq is 30. (#131) Changes for bbmap * no min an max for sequence length and start. (like exact matching) * using default of 30 mapq instead of 35 * feat!: outlier removal (#132) * feat!: outlier detection Might break older config files * docs: update documentation for bbmap, apptainer and outlier removal * use abs for zscore * trying to fix outlier via zscore * mad code change * change outlier removal default to zscore --------- Co-authored-by: Max Schubach * edit config * Update conventional-prs.yml --------- Co-authored-by: Max Schubach Co-authored-by: Ali <69039717+bioinformaticsguy@users.noreply.github.com> Co-authored-by: Max Schubach Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --- .github/workflows/conventional-prs.yml | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/.github/workflows/conventional-prs.yml b/.github/workflows/conventional-prs.yml index ab3a344c..c847324c 100644 --- a/.github/workflows/conventional-prs.yml +++ b/.github/workflows/conventional-prs.yml @@ -1,5 +1,5 @@ --- -name: PR +name: "Lint PR" on: pull_request_target: types: @@ -8,12 +8,14 @@ on: - edited - synchronize +permissions: + pull-requests: read + jobs: - title-format: + main: + name: Validate PR title runs-on: ubuntu-latest steps: - - uses: amannn/action-semantic-pull-request@v4 + - uses: amannn/action-semantic-pull-request@v5 env: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} - with: - validateSingleCommit: true From 6e447d567d141071b47962c813bbfe8ccd8efb12 Mon Sep 17 00:00:00 2001 From: Max Date: Tue, 5 Nov 2024 15:28:38 +0100 Subject: [PATCH 02/10] Update Dockerfile --- Dockerfile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Dockerfile b/Dockerfile index cb2bf4ab..4bfc7e67 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,4 +1,4 @@ -ARG VERSION=0.1.1 +ARG VERSION=0.2.0 FROM condaforge/miniforge3:latest LABEL io.github.snakemake.containerized="true" From 4b7cc33af056abf16e1f5ebccc62ad709dd22e3c Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 5 Nov 2024 15:42:38 +0100 Subject: [PATCH 03/10] chore(master): release MPRAsnakeflow 0.2.0 (#135) * chore(master): release MPRAsnakeflow 0.2.0 * Update CHANGELOG.md --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Max --- .release-please-manifest.json | 2 +- CHANGELOG.md | 70 +++++++++++++++++++++++++++++++++++ version.txt | 2 +- 3 files changed, 72 insertions(+), 2 deletions(-) diff --git a/.release-please-manifest.json b/.release-please-manifest.json index a915e8c5..2be9c43c 100644 --- a/.release-please-manifest.json +++ b/.release-please-manifest.json @@ -1,3 +1,3 @@ { - ".": "0.1.1" + ".": "0.2.0" } diff --git a/CHANGELOG.md b/CHANGELOG.md index 9f1a35e0..889dff2f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,23 @@ # Changelog +## [0.2.0](https://github.com/kircherlab/MPRAsnakeflow/compare/MPRAsnakeflow-v0.1.1...MPRAsnakeflow-v0.2.0) (2024-11-05) + +### ⚠ BREAKING CHANGES + +* Support only snakemake >=8.24.1 ([#130](https://github.com/kircherlab/MPRAsnakeflow/pull/130)) +* File output formats and locations changed +* Normalization changed which may result in different outputs + +### Features + + * outlier removal methods ([#132](https://github.com/kircherlab/MPRAsnakeflow/pull/132)) + * No min max length for bbmap. default mapq is 30. ([#131](https://github.com/kircherlab/MPRAsnakeflow/pull/131)) + * IGVF outputs ([#129](https://github.com/kircherlab/MPRAsnakeflow/pull/129)) + * Documentation improvements + + +### Bug Fixes + ## [0.1.1](https://github.com/kircherlab/MPRAsnakeflow/compare/MPRAsnakeflow-v0.1.0...MPRAsnakeflow-v0.1.1) (2024-09-30) ### Bug Fixes @@ -19,3 +37,55 @@ First release of MPRAsnakeflow! * Barcode count output * Snakemake 8 support * Extended documentation: https://mprasnakeflow.readthedocs.io + + +## older development + + +### ⚠ BREAKING CHANGES + +* latest development for new release ([#133](https://github.com/kircherlab/MPRAsnakeflow/issues/133)) +* pseudocounts where not used correctly when RNA or DNA set to 0 +* DNA and RNA join correction + +### Features + +* Add assignment_merge thread configuration ([26e68c2](https://github.com/kircherlab/MPRAsnakeflow/commit/26e68c26f315c524cf28692d636127fbf3bdeb2b)) +* better assignment BC statistics ([00187e6](https://github.com/kircherlab/MPRAsnakeflow/commit/00187e689b2fad10fd317aa2efbd0214fad14434)) +* configurable min mapping quality ([28045ae](https://github.com/kircherlab/MPRAsnakeflow/commit/28045aea23d6fa03f3883b3dc44b3cbc3e8f6205)) +* extending figure width ([8bf81c4](https://github.com/kircherlab/MPRAsnakeflow/commit/8bf81c45e45f9b4c23856c0915bd527f9699b6cd)) +* faster design check ([315b402](https://github.com/kircherlab/MPRAsnakeflow/commit/315b402499d92850382d4110e153602020381e8a)) +* fastq-join implementation ([aaf5315](https://github.com/kircherlab/MPRAsnakeflow/commit/aaf5315364ebb3e3117c3996c2fc357aa9c4d595)) +* latest development for new release ([#133](https://github.com/kircherlab/MPRAsnakeflow/issues/133)) ([bdfc557](https://github.com/kircherlab/MPRAsnakeflow/commit/bdfc557a64cecc19d1d86eead8bdb691a1ff2166)) +* make filtering consistent ([5f7a4c5](https://github.com/kircherlab/MPRAsnakeflow/commit/5f7a4c5a2a3389a75b8d6b7e9aaf34485127b3a4)) +* master variant table ([6bda47c](https://github.com/kircherlab/MPRAsnakeflow/commit/6bda47c78021bc1728bb81a716f5e6daaf6ac084)) +* new final output file with merged replicates ([66cf017](https://github.com/kircherlab/MPRAsnakeflow/commit/66cf0172cb6b556e507be4daabf7e859447787f3)) +* only link assignment fasta when possible ([d7d3822](https://github.com/kircherlab/MPRAsnakeflow/commit/d7d3822933c98d790f3c96bcbfdef1a7ea70c7df)), closes [#50](https://github.com/kircherlab/MPRAsnakeflow/issues/50) +* remove space, speedup BC extraction ([70e9bd0](https://github.com/kircherlab/MPRAsnakeflow/commit/70e9bd06b91ccb37333e0a69c47917a5eacbf639)) +* replace merging by NGmerge ([0aa8cad](https://github.com/kircherlab/MPRAsnakeflow/commit/0aa8cad6884a953f9c89a2fdd7af397e4e9ccf3e)) +* snakemake 8 compatibility ([cf38ed9](https://github.com/kircherlab/MPRAsnakeflow/commit/cf38ed9de68367d0d1700ccff262e91ad6f1fbc0)) +* snakemake 8 ready with workflow profile ([d637e1f](https://github.com/kircherlab/MPRAsnakeflow/commit/d637e1fdbebfca0616d944101898fbf522df9c82)) +* statistic for assignment workflow ([10c3b26](https://github.com/kircherlab/MPRAsnakeflow/commit/10c3b2677ada59925ddd3de777f7488c9a20e981)) +* using reverese compelment BCs ([d009a6c](https://github.com/kircherlab/MPRAsnakeflow/commit/d009a6c3de7de50a210479b73f5d41969287e234)) + + +### Bug Fixes + +* batch size issue in sort ([487ba8c](https://github.com/kircherlab/MPRAsnakeflow/commit/487ba8ce059517030fcab3708c3cea40ac210f7e)) +* correct use of assignment configs ([58b64f1](https://github.com/kircherlab/MPRAsnakeflow/commit/58b64f1e753477f7410233ac546701ddbd60f9f2)) +* corrected qc_report_assoc ([afb0127](https://github.com/kircherlab/MPRAsnakeflow/commit/afb012750bc1c3c39f2348b283c23ff97695f672)) +* Detach from anaconda ([#122](https://github.com/kircherlab/MPRAsnakeflow/issues/122)) ([16bcea2](https://github.com/kircherlab/MPRAsnakeflow/commit/16bcea2f04190a5965ad1865cf30f6dd44f1b6a0)) +* DNA and RNA join correction ([7214743](https://github.com/kircherlab/MPRAsnakeflow/commit/7214743008dc6796077e45e62646174ffaf52290)) +* filter config ([38ee37e](https://github.com/kircherlab/MPRAsnakeflow/commit/38ee37ecfcf4a71b840575504811512e0d64609a)) +* issue with stats and asisgnment ([d935fa1](https://github.com/kircherlab/MPRAsnakeflow/commit/d935fa1f62825dfdcd2cd77e4c73bc37686519a0)) +* memory resources for bbmap ([#123](https://github.com/kircherlab/MPRAsnakeflow/issues/123)) ([af93f58](https://github.com/kircherlab/MPRAsnakeflow/commit/af93f588e9387ddf91197f5587d36c3481499b38)) +* plots per insert only used last experiment. not all. ([c2fd82b](https://github.com/kircherlab/MPRAsnakeflow/commit/c2fd82b6d4b545cc3a1acc5ecb145eb3c93af49d)) +* pseudocounts where not used correctly when RNA or DNA set to 0 ([d2483f9](https://github.com/kircherlab/MPRAsnakeflow/commit/d2483f9c7724e0b63cec4f251519d449831ecf04)) +* remove illegal characters from reference ([0ebee81](https://github.com/kircherlab/MPRAsnakeflow/commit/0ebee81d74f3f6170ce4b8083e18c746550154db)) +* rename barcoe output header ([635f043](https://github.com/kircherlab/MPRAsnakeflow/commit/635f0431c78d3d5bf9b77a16f6ce26d9ff6c82c2)) +* rule make_master_tables fix ([df42845](https://github.com/kircherlab/MPRAsnakeflow/commit/df42845b6dfa9a7b64f187b38f1f15518f3e4a31)) +* statistic total counts ([6381b92](https://github.com/kircherlab/MPRAsnakeflow/commit/6381b928fd6c14eb16801a459b8546fa37004c74)) +* typo in report ([ace8cca](https://github.com/kircherlab/MPRAsnakeflow/commit/ace8ccacb3d7ece04af43c9b0b1dc9c9c087a2c4)) +* upgrade code to new pandas version ([aaea236](https://github.com/kircherlab/MPRAsnakeflow/commit/aaea236bc83f459e7a6c2d3fee96d49c79762325)) +* using correct threads ([6dcad7d](https://github.com/kircherlab/MPRAsnakeflow/commit/6dcad7d34173f37d4538644b1ba0d918afd8f149)) +* using multiple fastq inputs in counts ([95935cf](https://github.com/kircherlab/MPRAsnakeflow/commit/95935cfe69956ca50307a9c6a774c4b96dff860f)) diff --git a/version.txt b/version.txt index 17e51c38..0ea3a944 100644 --- a/version.txt +++ b/version.txt @@ -1 +1 @@ -0.1.1 +0.2.0 From 9573b661afb83590b7ac5aedac7b6d3d5e61d8a4 Mon Sep 17 00:00:00 2001 From: Max Date: Wed, 20 Nov 2024 15:05:54 +0100 Subject: [PATCH 04/10] feat!: versioned config (#140) * no mad outlier detection, version controled config, global config removed * getting action linter to run * fix: update argument syntax for version retrieval in GitHub Actions workflow * fix: correct argument syntax for version retrieval in GitHub Actions workflow * fix: correct argument syntax for version retrieval in GitHub Actions workflow * fix: update argument syntax for version retrieval in GitHub Actions workflow * fix: add skip version check option in workflow configuration --------- Co-authored-by: Max Schubach --- .github/workflows/main.yml | 2 +- config/example_assignment_bbmap.yaml | 5 +-- config/example_assignment_bwa.yaml | 5 +-- config/example_assignment_exact_lazy.yaml | 5 +-- config/example_assignment_exact_linker.yaml | 5 +-- config/example_config.yaml | 5 +-- config/example_count.yaml | 1 + docs/config.rst | 20 ++++----- docs/quickstart.rst | 2 +- resources/assoc_basic/config.yml | 5 +-- resources/combined_basic/config.yml | 7 ++-- resources/count_basic/config.yml | 5 +-- workflow/rules/assigned_counts.smk | 16 ++----- workflow/rules/common.smk | 37 ++++++++++++++--- workflow/schemas/config.schema.yaml | 46 +++++++-------------- 15 files changed, 80 insertions(+), 86 deletions(-) diff --git a/.github/workflows/main.yml b/.github/workflows/main.yml index bc024c60..6dec5a1c 100644 --- a/.github/workflows/main.yml +++ b/.github/workflows/main.yml @@ -50,7 +50,7 @@ jobs: with: directory: . snakefile: workflow/Snakefile - args: "--lint --configfile config/example_config.yaml" + args: "--lint --configfile config/example_config.yaml --config skip_version_check=True" # Testing: # runs-on: ubuntu-latest # needs: diff --git a/config/example_assignment_bbmap.yaml b/config/example_assignment_bbmap.yaml index 04596867..33799f7b 100644 --- a/config/example_assignment_bbmap.yaml +++ b/config/example_assignment_bbmap.yaml @@ -1,11 +1,10 @@ --- -global: # generall configs effecting one or multiple parts - assignments: - split_number: 1 # number of files fastq should be split for parallelization +version: "0.3" assignments: exampleAssignment: # name of an example assignment (can be any string) bc_length: 15 alignment_tool: + split_number: 1 # number of files fastq should be split for parallelization tool: bbmap configs: min_mapping_quality: 30 # 30 is default for bbmap diff --git a/config/example_assignment_bwa.yaml b/config/example_assignment_bwa.yaml index 439f7373..9a51f74c 100644 --- a/config/example_assignment_bwa.yaml +++ b/config/example_assignment_bwa.yaml @@ -1,11 +1,10 @@ --- -global: # generall configs effecting one or multiple parts - assignments: - split_number: 1 # number of files fastq should be split for parallelization +version: "0.3" assignments: exampleAssignment: # name of an example assignment (can be any string) bc_length: 15 alignment_tool: + split_number: 1 # number of files fastq should be split for parallelization tool: bwa configs: min_mapping_quality: 1 # integer >=0 Please use 1 when you have oligos that differ by 1 base in your reference/design_file diff --git a/config/example_assignment_exact_lazy.yaml b/config/example_assignment_exact_lazy.yaml index 224c4428..1f7692cf 100644 --- a/config/example_assignment_exact_lazy.yaml +++ b/config/example_assignment_exact_lazy.yaml @@ -1,11 +1,10 @@ --- -global: # generall configs effecting one or multiple parts - assignments: - split_number: 1 # number of files fastq should be split for parallelization +version: "0.3" assignments: exampleAssignment: # name of an example assignment (can be any string) bc_length: 15 alignment_tool: + split_number: 1 # number of files fastq should be split for parallelization tool: exact # bwa or exact configs: sequence_length: 171 # sequence length of design excluding adapters. diff --git a/config/example_assignment_exact_linker.yaml b/config/example_assignment_exact_linker.yaml index 15f585f6..b05a4bdc 100644 --- a/config/example_assignment_exact_linker.yaml +++ b/config/example_assignment_exact_linker.yaml @@ -1,13 +1,12 @@ --- -global: # generall configs effecting one or multiple parts - assignments: - split_number: 1 # number of files fastq should be split for parallelization +version: "0.3" assignments: exampleAssignment: # name of an example assignment (can be any string) bc_length: 20 BC_rev_comp: true linker: TCTAGACCGTCACTAACTAACAGTGGGTACCC alignment_tool: + split_number: 1 # number of files fastq should be split for parallelization tool: exact # bwa or exact configs: sequence_length: 171 # sequence length of design excluding adapters. diff --git a/config/example_config.yaml b/config/example_config.yaml index 85a45d22..d8da194f 100644 --- a/config/example_config.yaml +++ b/config/example_config.yaml @@ -1,11 +1,10 @@ --- -global: # generall configs effecting one or multiple parts - assignments: - split_number: 1 # number of files fastq should be split for parallelization +version: "0.3" assignments: exampleAssignment: # name of an example assignment (can be any string) bc_length: 15 alignment_tool: + split_number: 1 # number of files fastq should be split for parallelization tool: exact # bbmap, bwa or exact configs: sequence_length: 171 # sequence length of design excluding adapters. diff --git a/config/example_count.yaml b/config/example_count.yaml index e53ed705..e5b99fee 100644 --- a/config/example_count.yaml +++ b/config/example_count.yaml @@ -1,4 +1,5 @@ --- +version: "0.3" experiments: exampleCount: bc_length: 15 diff --git a/docs/config.rst b/docs/config.rst index fbf5d133..6e19db91 100644 --- a/docs/config.rst +++ b/docs/config.rst @@ -4,7 +4,7 @@ Config File ===================== -The config file is a yaml file that contains the configuration. Different runs can be configured. We recommend using one config file per MPRA experiment or MPRA project. But in theory, many different experiments can be configured in only one file. It is divided into :code:`global` (general settings), :code:`assignments` (assigment workflow), and :code:`experiments` (count workflow including variants). This is a full example file with default configurations. :download:`config/example_config.yaml <../config/example_config.yaml>`. +The config file is a yaml file that contains the configuration. Different runs can be configured. We recommend using one config file per MPRA experiment or MPRA project. But in theory, many different experiments can be configured in only one file. It is divided into :code:`version` (version of MPRAsnakeflow used), :code:`assignments` (assigment workflow), and :code:`experiments` (count workflow). This is a full example file with default configurations. :download:`config/example_config.yaml <../config/example_config.yaml>`. .. literalinclude:: ../config/example_config.yaml :language: yaml @@ -14,21 +14,18 @@ The config file is a yaml file that contains the configuration. Different runs c Note that the config file is controlled by json schema. This means that the config file is validated against the schema. If the config file is not valid, the program will exit with an error message. The schema is located in :download:`workflow/schemas/config.schema.yaml <../workflow/schemas/config.schema.yaml>`. ---------------- -General settings +Version settings ---------------- -The general settings are located in the :code:`global` section. The following settings are possible: +Set the version of the of MPRAsnakeflow this configuration is used. This is important for future updates. The version is used to check if the config file is compatible with the current version of the workflow. If the version is not the same the workflow will exit with an error message. .. literalinclude:: ../workflow/schemas/config.schema.yaml :language: yaml - :start-after: start_global + :start-after: start_version :end-before: start_assignments -:assignments: - Global parameters that hold for the assignment workflow. - - :split_number: - To parallize mapping for assignment the reads are split into :code:`split_number` files. E.g. setting to 300 means that the reads are split into 300 files and each file is mapped in parallel. This is only useful when using on a cluster. Running the workflow only on one machine the default value should be used. The default is set to 1. +:version: + A a string like "0.2.0" or "1.2". When major version "0" is used the minor version should fit with MPRAsnakeflow, e.g. "0.2.0" is compatible with MPRAsnakeflow 0.2.0. as well as 0.2.1 or 0.2.2. When major version greater 0 used then the major version have to fith with MPRAsnakeflow. E.g. config of "1.2.1" fits also with MPRAsnakeflow 1.7 or 1.0. -------------------- Assignment workflow @@ -43,9 +40,12 @@ The assignment workflow is configured in the :code:`assignments` section. The fo For each assignment you want to process you have to give him a name like :code:`example_assignment`. The name is used to name the output files. + :alignment_tool: Alignment tool configuration that is used to map the reads to the oligos. - + + :split_number: + To parallize mapping for assignment the reads are split into :code:`split_number` files. E.g. setting to 300 means that the reads are split into 300 files and each file is mapped in parallel. This is only useful when using on a cluster. Running the workflow only on one machine the default value should be used. The default is set to :code:`1`. (For technical reasons when multiple assignments defined all will set to the maximum defined in the config.) :tool: Alignment tool that is used. Currently :code:`bbmap` :code:`bwa`, :code:`exact` are supported. Default is :code:`bbmap`. :configs: diff --git a/docs/quickstart.rst b/docs/quickstart.rst index b997e730..0ad6d549 100644 --- a/docs/quickstart.rst +++ b/docs/quickstart.rst @@ -37,7 +37,7 @@ MPRAsnakeflow exoists of two subworkflows, :ref:`Assignment` and :ref:`Experimen 3. Set up the config file -The config file is the heart of MPRAsnakflow. Here different runs can be configured. We recommend using one config file per MPRA experiment or MPRA project. But in theory, many different experiments can be configured in only one file. It is divided into :code:`global` (general settings), :code:`assignments` (assigment workflow), and :code:`experiments` (count workflow including variants). +The config file is the heart of MPRAsnakflow. Here different runs can be configured. We recommend using one config file per MPRA experiment or MPRA project. But in theory, many different experiments can be configured in only one file. It is divided into :code:`version` (used MPRAsnakeflow version), :code:`assignments` (assigment workflow), and :code:`experiments` (count workflow). See :ref:`Config` for more details about the config file. Here is an example running only the count experiments and using a provided assignment file. diff --git a/resources/assoc_basic/config.yml b/resources/assoc_basic/config.yml index 10a1020e..856c87f3 100644 --- a/resources/assoc_basic/config.yml +++ b/resources/assoc_basic/config.yml @@ -1,11 +1,10 @@ --- -global: - assignments: - split_number: 30 +version: "0.3" assignments: assocBasic: bc_length: 15 alignment_tool: + split_number: 30 tool: bbmap configs: sequence_length: 171 diff --git a/resources/combined_basic/config.yml b/resources/combined_basic/config.yml index c18a618a..c6dc1c6d 100644 --- a/resources/combined_basic/config.yml +++ b/resources/combined_basic/config.yml @@ -1,11 +1,10 @@ --- -global: - assignments: - split_number: 30 +version: "0.3" assignments: assocBasic: bc_length: 15 alignment_tool: + split_number: 30 tool: bbmap configs: sequence_length: 171 @@ -30,7 +29,7 @@ experiments: fromWorkflow: type: config assignment_name: assocBasic - assignment_config: configs + assignment_config: default design_file: design.fa configs: default: {} diff --git a/resources/count_basic/config.yml b/resources/count_basic/config.yml index c9c27a8c..1214e995 100644 --- a/resources/count_basic/config.yml +++ b/resources/count_basic/config.yml @@ -1,4 +1,5 @@ --- +version: "0.3" experiments: exampleCount: bc_length: 15 @@ -13,10 +14,6 @@ experiments: design_file: design.fa configs: default: {} - outlierNone: - filter: - outlier_detection: - method: none outlierZscore: filter: outlier_detection: diff --git a/workflow/rules/assigned_counts.smk b/workflow/rules/assigned_counts.smk index c91d1a52..36696a2b 100644 --- a/workflow/rules/assigned_counts.smk +++ b/workflow/rules/assigned_counts.smk @@ -115,20 +115,12 @@ rule assigned_counts_dna_rna_merge: % config["experiments"][wc.project]["configs"][wc.config]["filter"][ "outlier_detection" ]["method"] - if config["experiments"][wc.project]["configs"][wc.config]["filter"][ + if "method" + in config["experiments"][wc.project]["configs"][wc.config]["filter"][ "outlier_detection" - ]["method"] - != "none" + ] else "" ), - outlier_mad_bins=lambda wc: "--outlier-ratio-mad-bins %d" - % config["experiments"][wc.project]["configs"][wc.config]["filter"][ - "outlier_detection" - ]["mad_bins"], - outlier_mad_times=lambda wc: "--outlier-ratio-mad-times %f" - % config["experiments"][wc.project]["configs"][wc.config]["filter"][ - "outlier_detection" - ]["times_mad"], outlier_zscore_times=lambda wc: "--outlier-rna-zscore-times %f" % config["experiments"][wc.project]["configs"][wc.config]["filter"][ "outlier_detection" @@ -143,7 +135,7 @@ rule assigned_counts_dna_rna_merge: --minRNACounts {params.minRNACounts} --minDNACounts {params.minDNACounts} \ --assignment {input.association} \ {params.outlier_detection} --outlier-barcodes {output.removed_bcs} \ - {params.outlier_mad_bins} {params.outlier_mad_times} {params.outlier_zscore_times} \ + {params.outlier_zscore_times} \ --output {output.counts} \ --bcOutput {output.bc_counts} \ --statistic {output.statistic} &> {log} diff --git a/workflow/rules/common.smk b/workflow/rules/common.smk index 42e50ed5..6e8965ed 100644 --- a/workflow/rules/common.smk +++ b/workflow/rules/common.smk @@ -23,6 +23,33 @@ if "experiments" in config: validate(experiment, schema="../schemas/experiment_file.schema.yaml") experiments[project] = experiment +# validate version of config with MPRAsnakeflow version + +import re + +# Regular expression to match the first two digits with the dot in the middle +pattern_major_version = r"^(\d+)" +pattern_development_version = r"^(0(\.\d+)?)" + + +def check_version(pattern, version, config_version): + # Search for the pattern in the string + match_version = re.search(pattern, version) + + match_config = re.search(pattern, config_version) + + # Check if a match is found and print the result + if match_version and match_config: + if match_version.group(1) != match_config.group(1): + raise ValueError( + f"\033[38;2;255;165;0mVersion mismatch: MPRAsnakeflow version is {version}, but config version is {config_version}\033[0m" + ) + + +if not config["skip_version_check"]: + check_version(pattern_development_version, version, config["version"]) + check_version(pattern_major_version, version, config["version"]) + ################################ #### HELPERS AND EXCEPTIONS #### @@ -509,14 +536,12 @@ def withoutZeros(project, conf): def getSplitNumber(): - split = 1 + splits = [] - if "global" in config: - if "assignments" in config["global"]: - if "split_number" in config["global"]["assignments"]: - split = config["global"]["assignments"]["split_number"] + for assignment in config["assignments"]: + splits += [config["assignments"][assignment]["alignment_tool"]["split_number"]] - return split + return max(splits) # count.smk specific functions diff --git a/workflow/schemas/config.schema.yaml b/workflow/schemas/config.schema.yaml index 1f069180..1e61d3bd 100644 --- a/workflow/schemas/config.schema.yaml +++ b/workflow/schemas/config.schema.yaml @@ -10,21 +10,15 @@ type: object # possible entries of the config file properties: - # start_global - global: - type: object - default: - assignments: - split_number: 1 - properties: - assignments: - type: object - properties: - split_number: - type: integer - default: 1 - additionalProperties: false - additionalProperties: false + # start_version + version: + description: Version of MPRAsnakeflow + type: string + pattern: ^(\d+(\.\d+)?(\.\d+)?)|(0\.\d+(\.\d+)?)$ + skip_version_check: + description: Skip version check + type: boolean + default: false # start_assignments assignments: description: Assignments to run with configurations @@ -37,6 +31,9 @@ properties: alignment_tool: type: object properties: + split_number: + type: integer + default: 1 tool: type: string enum: @@ -336,25 +333,11 @@ properties: type: string enum: - rna_counts_zscore - - ratio_mad - - none - default: rna_counts_zscore - mad_bins: - type: integer - minimum: 1 - default: 20 - times_mad: - type: number - exclusiveMinimum: 0 - default: 5 times_zscore: type: number exclusiveMinimum: 0 default: 3 required: - - method - - mad_bins - - times_mad - times_zscore additionalProperties: false default: {} @@ -419,4 +402,7 @@ properties: additionalProperties: false # end_experiments additionalProperties: false -minProperties: 1 +required: + - version + - skip_version_check +minProperties: 3 From 978f8b90d691d48a7d4e879f152e98dfbcac7347 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Wed, 20 Nov 2024 15:36:07 +0100 Subject: [PATCH 05/10] chore(master): release MPRAsnakeflow 0.3.0 (#141) * chore(master): release MPRAsnakeflow 0.3.0 * Update CHANGELOG.md --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Max --- .release-please-manifest.json | 2 +- CHANGELOG.md | 14 ++++++++++++++ version.txt | 2 +- 3 files changed, 16 insertions(+), 2 deletions(-) diff --git a/.release-please-manifest.json b/.release-please-manifest.json index 2be9c43c..0ee8c012 100644 --- a/.release-please-manifest.json +++ b/.release-please-manifest.json @@ -1,3 +1,3 @@ { - ".": "0.2.0" + ".": "0.3.0" } diff --git a/CHANGELOG.md b/CHANGELOG.md index 889dff2f..488f30f1 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,19 @@ # Changelog +## [0.3.0](https://github.com/kircherlab/MPRAsnakeflow/compare/MPRAsnakeflow-v0.2.0...MPRAsnakeflow-v0.3.0) (2024-11-20) + + +### ⚠ BREAKING CHANGES + +* versioned config ([#140](https://github.com/kircherlab/MPRAsnakeflow/issues/140)) + +### Features + +* versioned config ([#140](https://github.com/kircherlab/MPRAsnakeflow/issues/140)) +* MAD outlier removal is completely removed ([#140](https://github.com/kircherlab/MPRAsnakeflow/issues/140)) +* default is NO outlier detection (none is not present anymore) ([#140](https://github.com/kircherlab/MPRAsnakeflow/issues/140)) +* global config is removed. splits moved now withing mapping in assignment ([#140](https://github.com/kircherlab/MPRAsnakeflow/issues/140)) + ## [0.2.0](https://github.com/kircherlab/MPRAsnakeflow/compare/MPRAsnakeflow-v0.1.1...MPRAsnakeflow-v0.2.0) (2024-11-05) ### ⚠ BREAKING CHANGES diff --git a/version.txt b/version.txt index 0ea3a944..0d91a54c 100644 --- a/version.txt +++ b/version.txt @@ -1 +1 @@ -0.2.0 +0.3.0 From 271f16edb325ab82dfc08835bd34d962bc74c27f Mon Sep 17 00:00:00 2001 From: Max Date: Mon, 25 Nov 2024 13:01:08 +0100 Subject: [PATCH 06/10] fixing error when no asisgnemnt is defined in config file --- workflow/rules/common.smk | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/workflow/rules/common.smk b/workflow/rules/common.smk index 6e8965ed..c4efccec 100644 --- a/workflow/rules/common.smk +++ b/workflow/rules/common.smk @@ -536,9 +536,9 @@ def withoutZeros(project, conf): def getSplitNumber(): - splits = [] + splits = [1] - for assignment in config["assignments"]: + for assignment in getAssignments(): splits += [config["assignments"][assignment]["alignment_tool"]["split_number"]] return max(splits) From 0f736e30e83c9c354869ea30c1dc8ae93a7eb54a Mon Sep 17 00:00:00 2001 From: Max Date: Mon, 25 Nov 2024 13:35:39 +0100 Subject: [PATCH 07/10] fix header im variant correlation --- workflow/scripts/variants/correlateVariantTables.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/workflow/scripts/variants/correlateVariantTables.py b/workflow/scripts/variants/correlateVariantTables.py index b79e9f33..fb5b5887 100644 --- a/workflow/scripts/variants/correlateVariantTables.py +++ b/workflow/scripts/variants/correlateVariantTables.py @@ -57,7 +57,7 @@ def filterOnThreshold(variants, threshold): variants_2 = filterOnThreshold(variants_2, bc_threshold) click.echo("Join variants file...") - variants_join = variants_1.join(variants_2, how="inner", lsuffix='_A', rsuffix='_B')[["log2_expression_A", "log2_expression_B"]] + variants_join = variants_1.join(variants_2, how="inner", lsuffix='_A', rsuffix='_B')[["log2FoldChange_expression_A", "log2FoldChange_expression_B"]] output = pd.concat([output, pd.DataFrame([[condition, rep_1, rep_2, variants_join.shape[0], bc_threshold, variants_join.corr(method="pearson").iloc[0,1],variants_join.corr(method="spearman").iloc[0,1]]])], ignore_index=True) From a4a79e869e1f42ead0c4f7f6943c1a3f0be03d24 Mon Sep 17 00:00:00 2001 From: Max Date: Mon, 25 Nov 2024 14:38:36 +0100 Subject: [PATCH 08/10] refactor: nicer BC script --- workflow/rules/assigned_counts.smk | 21 ++------- .../count/merge_replicates_barcode_counts.py | 43 +++---------------- 2 files changed, 11 insertions(+), 53 deletions(-) diff --git a/workflow/rules/assigned_counts.smk b/workflow/rules/assigned_counts.smk index 36696a2b..61277fad 100644 --- a/workflow/rules/assigned_counts.smk +++ b/workflow/rules/assigned_counts.smk @@ -211,23 +211,10 @@ rule assigned_counts_combine_replicates_barcode_output: thresh=lambda wc: config["experiments"][wc.project]["configs"][wc.config][ "filter" ]["bc_threshold"], - replicates=lambda wc: " ".join( - [ - "--replicate %s" % r - for r in getReplicatesOfCondition(wc.project, wc.condition) - ] - ), bc_counts=lambda wc: " ".join( [ - "--counts %s" % c - for c in expand( - "results/experiments/{project}/assigned_counts/{assignment}/{config}/{condition}_{replicate}_barcode_assigned_counts.tsv.gz", - replicate=getReplicatesOfCondition(wc.project, wc.condition), - project=wc.project, - condition=wc.condition, - assignment=wc.assignment, - config=wc.config, - ) + "--counts %s results/experiments/%s/assigned_counts/%s/%s/%s_%s_barcode_assigned_counts.tsv.gz" % (rep, wc.project, wc.assignment, wc.config, wc.condition, rep) + for rep in getReplicatesOfCondition(wc.project, wc.condition) ] ), log: @@ -236,9 +223,9 @@ rule assigned_counts_combine_replicates_barcode_output: ), shell: """ - python {input.script} {params.bc_counts} \ + python {input.script} \ + {params.bc_counts} \ --threshold {params.thresh} \ - {params.replicates} \ --output-threshold {output.bc_merged_thresh} \ --output {output.bc_merged_all} &> {log} """ diff --git a/workflow/scripts/count/merge_replicates_barcode_counts.py b/workflow/scripts/count/merge_replicates_barcode_counts.py index a5495264..e5a8b7c8 100644 --- a/workflow/scripts/count/merge_replicates_barcode_counts.py +++ b/workflow/scripts/count/merge_replicates_barcode_counts.py @@ -8,8 +8,8 @@ "counts_files", required=True, multiple=True, - type=click.Path(exists=True, readable=True), - help="Assigned barcode count file", + type=(str,click.Path(exists=True, readable=True)), + help="Replicate name and assigned barcode count file", ) @click.option( "--threshold", @@ -19,14 +19,6 @@ type=int, help="Number of required barcodes (default 10)", ) -@click.option( - "--replicate", - "replicates", - multiple=True, - type=str, - help="replicate name", - required=True, -) @click.option( "--output", "output_threshold_file", @@ -41,39 +33,18 @@ type=click.Path(writable=True), help="Output file.", ) -def cli(counts_files, bc_thresh, replicates, output_threshold_file, output_file): +def cli(counts_files, bc_thresh, output_threshold_file, output_file): """ Merge the associated barcode count files of all replicates. """ - # ensure there are as many replicates as there are files - if len(replicates) != len(counts_files): - raise ( - click.BadParameter( - "Number of replicates ({}) doesn't equal the number of files ({}).".format( - len(replicates), len(counts_files) - ) - ) - ) - - # check if every file exists - for file in counts_files: - if not os.path.exists(file): - raise (click.BadParameter("{}: file not found".format(file))) - all_reps = [] - for file in counts_files: - curr_rep = -1 - # find the replicate name of the current file - for rep in replicates: - if rep in os.path.basename(file).split("_")[1]: - curr_rep = rep - break - if curr_rep == -1: - raise (click.BadParameter("{}: incorrect file".format(file))) + replicates = [] + for rep, file in counts_files: df = pd.read_csv(file, sep="\t") - df['replicate'] = curr_rep + df['replicate'] = rep all_reps.append(df) + replicates.append(rep) df = pd.concat(all_reps) df = df[df["oligo_name"] != "no_BC"] From e830dd58adbaca038a3a87144bde76618be93bd2 Mon Sep 17 00:00:00 2001 From: Max Date: Tue, 26 Nov 2024 09:13:29 +0100 Subject: [PATCH 09/10] fix: paths in assignment quarto --- workflow/rules/qc_report.smk | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/workflow/rules/qc_report.smk b/workflow/rules/qc_report.smk index f5a91c3e..ac8aa356 100644 --- a/workflow/rules/qc_report.smk +++ b/workflow/rules/qc_report.smk @@ -49,18 +49,18 @@ rule qc_report_assoc: cp {input.quarto_script} {output.quarto_file}; cd `dirname {output.quarto_file}`; quarto render `basename {output.quarto_file}` --output `basename {output.assi_file}` \ - -P assignment:{wildcards.assignment} \ - -P bc_length:{params.bc_length} \ - -P fw:{params.fw} \ - -P rev:{params.rev} \ - -P bc:{params.bc} \ - -P workdir:{params.workdir} \ - -P design_file:{input.design_file} \ - -P design_file_checked:{input.design_file_checked} \ - -P configs:{wildcards.assignment_config} \ - -P plot_file:{input.plot} \ - -P statistic_filter_file:{input.statistic_filter} \ - -P statistic_all_file:{input.statistic_all} + -P "assignment:{wildcards.assignment}" \ + -P "bc_length:{params.bc_length}" \ + -P "fw:{params.fw}" \ + -P "rev:{params.rev}" \ + -P "bc:{params.bc}" \ + -P "workdir:{params.workdir}" \ + -P "design_file:{input.design_file}" \ + -P "design_file_checked:{input.design_file_checked}" \ + -P "configs:{wildcards.assignment_config}" \ + -P "plot_file:{input.plot}" \ + -P "statistic_filter_file:{input.statistic_filter}" \ + -P "statistic_all_file:{input.statistic_all}" ) &> {log} """ From 061d78af8307d5ae80b2ff38344e537f655950ba Mon Sep 17 00:00:00 2001 From: Max Date: Tue, 26 Nov 2024 09:17:26 +0100 Subject: [PATCH 10/10] make similar for count qc --- workflow/rules/qc_report.smk | 40 ++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/workflow/rules/qc_report.smk b/workflow/rules/qc_report.smk index ac8aa356..8221a7b4 100644 --- a/workflow/rules/qc_report.smk +++ b/workflow/rules/qc_report.smk @@ -110,25 +110,25 @@ rule qc_report_count: cp {input.quarto_script} {output.quarto_file}; cd `dirname {output.quarto_file}`; quarto render `basename {output.quarto_file}` --output `basename {output.count_file}` \ - -P assignment:{wildcards.assignment} \ - -P project:{wildcards.project} \ - -P dna_over_rna_plot:{input.dna_over_rna} \ - -P dna_over_rna_thresh_plot:{input.dna_over_rna_thresh} \ - -P dna_oligo_coor_min_thre_plot:{input.dna_oligo_coor_min_thre_plot} \ - -P rna_oligo_coor_min_thre_plot:{input.rna_oligo_coor_min_thre_plot} \ - -P dna_oligo_coor_plot:{input.dna_oligo_coor_plot} \ - -P rna_oligo_coor_plot:{input.rna_oligo_coor_plot} \ - -P ratio_oligo_coor_plot:{input.ratio_oligo_coor_plot} \ - -P ratio_oligo_min_thre_plot:{input.ratio_oligo_min_thre_plot} \ - -P statistics_all_merged:{input.statistics_all_merged} \ - -P counts_per_oligo_dna:{input.counts_per_oligo_dna} \ - -P counts_per_oligo_rna:{input.counts_per_oligo_rna} \ - -P statistics_all_single:{input.statistics_all_single} \ - -P activity_all:{input.activity_all} \ - -P activity_thresh:{input.activity_thresh} \ - -P statistics_all_oligo_cor_all:{input.statistics_all_oligo_cor_all} \ - -P statistics_all_oligo_cor_thresh:{input.statistics_all_oligo_cor_thresh} \ - -P thresh:{params.thresh} \ - -P workdir:{params.workdir} + -P "assignment:{wildcards.assignment}" \ + -P "project:{wildcards.project}" \ + -P "dna_over_rna_plot:{input.dna_over_rna}" \ + -P "dna_over_rna_thresh_plot:{input.dna_over_rna_thresh}" \ + -P "dna_oligo_coor_min_thre_plot:{input.dna_oligo_coor_min_thre_plot}" \ + -P "rna_oligo_coor_min_thre_plot:{input.rna_oligo_coor_min_thre_plot}" \ + -P "dna_oligo_coor_plot:{input.dna_oligo_coor_plot}" \ + -P "rna_oligo_coor_plot:{input.rna_oligo_coor_plot}" \ + -P "ratio_oligo_coor_plot:{input.ratio_oligo_coor_plot}" \ + -P "ratio_oligo_min_thre_plot:{input.ratio_oligo_min_thre_plot}" \ + -P "statistics_all_merged:{input.statistics_all_merged}" \ + -P "counts_per_oligo_dna:{input.counts_per_oligo_dna}" \ + -P "counts_per_oligo_rna:{input.counts_per_oligo_rna}" \ + -P "statistics_all_single:{input.statistics_all_single}" \ + -P "activity_all:{input.activity_all}" \ + -P "activity_thresh:{input.activity_thresh}" \ + -P "statistics_all_oligo_cor_all:{input.statistics_all_oligo_cor_all}" \ + -P "statistics_all_oligo_cor_thresh:{input.statistics_all_oligo_cor_thresh}" \ + -P "thresh:{params.thresh}" \ + -P "workdir:{params.workdir}" ) &> {log} """