diff --git a/.editorconfig b/.editorconfig index 95549501..b6b31907 100644 --- a/.editorconfig +++ b/.editorconfig @@ -8,12 +8,9 @@ trim_trailing_whitespace = true indent_size = 4 indent_style = space -[*.{yml,yaml}] +[*.{md,yml,yaml,html,css,scss,js}] indent_size = 2 -[*.json] -insert_final_newline = unset - # These files are edited and tested upstream in nf-core/modules [/modules/nf-core/**] charset = unset diff --git a/.gitattributes b/.gitattributes index 7fe55006..050bb120 100644 --- a/.gitattributes +++ b/.gitattributes @@ -1 +1,3 @@ *.config linguist-language=nextflow +modules/nf-core/** linguist-generated +subworkflows/nf-core/** linguist-generated diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index ad41f31d..3bd0f323 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -15,8 +15,7 @@ Contributions to the code are even more welcome ;) If you'd like to write some code for nf-core/mag, the standard workflow is as follows: -1. Check that there isn't already an issue about your idea in the [nf-core/mag issues](https://github.com/nf-core/mag/issues) to avoid duplicating work - * If there isn't one already, please create one so that others know you're working on this +1. Check that there isn't already an issue about your idea in the [nf-core/mag issues](https://github.com/nf-core/mag/issues) to avoid duplicating work. If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/mag repository](https://github.com/nf-core/mag) to your GitHub account 3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) 4. Use `nf-core schema build` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). @@ -49,9 +48,9 @@ These tests are run both with the latest available version of `Nextflow` and als :warning: Only in the unlikely and regretful event of a release happening with a bug. -* On your own fork, make a new branch `patch` based on `upstream/master`. -* Fix the bug, and bump version (X.Y.Z+1). -* A PR should be made on `master` from patch to directly this particular bug. +- On your own fork, make a new branch `patch` based on `upstream/master`. +- Fix the bug, and bump version (X.Y.Z+1). +- A PR should be made on `master` from patch to directly this particular bug. ## Getting help @@ -68,16 +67,13 @@ If you wish to contribute a new step, please use the following coding standards: 1. Define the corresponding input channel into your new process from the expected previous process channel 2. Write the process block (see below). 3. Define the output channel if needed (see below). -4. Add any new flags/options to `nextflow.config` with a default (see below). -5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build`). -6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). -7. Add sanity checks for all relevant parameters. -8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. -9. Do local tests that the new code works properly and as expected. -10. Add a new test command in `.github/workflow/ci.yml`. -11. If applicable add a [MultiQC](https://https://multiqc.info/) module. -12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. -13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. +4. Add any new parameters to `nextflow.config` with a default (see below). +5. Add any new parameters to `nextflow_schema.json` with help text (via the `nf-core schema build` tool). +6. Add sanity checks and validation for all relevant parameters. +7. Perform local tests to validate that the new code works as expected. +8. If applicable, add a new test command in `.github/workflow/ci.yml`. +9. Update MultiQC config `assets/multiqc_config.yml` so relevant suffixes, file name clean up and module plots are in the appropriate order. If applicable, add a [MultiQC](https://https://multiqc.info/) module. +10. Add a description of the output files and if relevant any appropriate images from the MultiQC report to `docs/output.md`. ### Default values @@ -95,34 +91,13 @@ The process resources can be passed on to the tool dynamically within the proces Please use the following naming schemes, to make it easy to understand what is going where. -* initial process channel: `ch_output_from_` -* intermediate and terminal channels: `ch__for_` +- initial process channel: `ch_output_from_` +- intermediate and terminal channels: `ch__for_` ### Nextflow version bumping If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` -### Software version reporting - -If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process. - -Add to the script block of the process, something like the following: - -```bash - --version &> v_.txt 2>&1 || true -``` - -or - -```bash - --help | head -n 1 &> v_.txt 2>&1 || true -``` - -You then need to edit the script `bin/scrape_software_versions.py` to: - -1. Add a Python regex for your tool's `--version` output (as in stored in the `v_.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1` -2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC. - ### Images and figures For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md deleted file mode 100644 index 11b427d2..00000000 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ /dev/null @@ -1,63 +0,0 @@ ---- -name: Bug report -about: Report something that is broken or incorrect -labels: bug ---- - - - -## Check Documentation - -I have checked the following places for your error: - -- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) -- [ ] [nf-core/mag pipeline documentation](https://nf-co.re/mag/usage) - -## Description of the bug - - - -## Steps to reproduce - -Steps to reproduce the behaviour: - -1. Command line: -2. See error: - -## Expected behaviour - - - -## Log files - -Have you provided the following extra information/files: - -- [ ] The command used to run the pipeline -- [ ] The `.nextflow.log` file - -## System - -- Hardware: -- Executor: -- OS: -- Version - -## Nextflow Installation - -- Version: - -## Container engine - -- Engine: -- version: - -## Additional context - - diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml new file mode 100644 index 00000000..d32afe01 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug_report.yml @@ -0,0 +1,50 @@ +name: Bug report +description: Report something that is broken or incorrect +labels: bug +body: + - type: markdown + attributes: + value: | + Before you post this issue, please check the documentation: + + - [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) + - [nf-core/mag pipeline documentation](https://nf-co.re/mag/usage) + + - type: textarea + id: description + attributes: + label: Description of the bug + description: A clear and concise description of what the bug is. + validations: + required: true + + - type: textarea + id: command_used + attributes: + label: Command used and terminal output + description: Steps to reproduce the behaviour. Please paste the command you used to launch the pipeline and the output from your terminal. + render: console + placeholder: | + $ nextflow run ... + + Some output where something broke + + - type: textarea + id: files + attributes: + label: Relevant files + description: | + Please drag and drop the relevant files here. Create a `.zip` archive if the extension is not allowed. + Your verbose log file `.nextflow.log` is often useful _(this is a hidden file in the directory where you launched the pipeline)_ as well as custom Nextflow configuration files. + + - type: textarea + id: system + attributes: + label: System information + description: | + * Nextflow version _(eg. 21.10.3)_ + * Hardware _(eg. HPC, Desktop, Cloud)_ + * Executor _(eg. slurm, local, awsbatch)_ + * Container engine: _(e.g. Docker, Singularity, Conda, Podman, Shifter or Charliecloud)_ + * OS _(eg. CentOS Linux, macOS, Linux Mint)_ + * Version of nf-core/mag _(eg. 1.1, 1.5, 1.8.2)_ diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml index f11d124c..3286c46b 100644 --- a/.github/ISSUE_TEMPLATE/config.yml +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -1,4 +1,3 @@ -blank_issues_enabled: false contact_links: - name: Join nf-core url: https://nf-co.re/join diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md deleted file mode 100644 index 48b8f593..00000000 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ /dev/null @@ -1,32 +0,0 @@ ---- -name: Feature request -about: Suggest an idea for the nf-core/mag pipeline -labels: enhancement ---- - - - -## Is your feature request related to a problem? Please describe - - - - - -## Describe the solution you'd like - - - -## Describe alternatives you've considered - - - -## Additional context - - diff --git a/.github/ISSUE_TEMPLATE/feature_request.yml b/.github/ISSUE_TEMPLATE/feature_request.yml new file mode 100644 index 00000000..7fa13dca --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.yml @@ -0,0 +1,11 @@ +name: Feature request +description: Suggest an idea for the nf-core/mag pipeline +labels: enhancement +body: + - type: textarea + id: description + attributes: + label: Description of feature + description: Please describe your suggestion for a new feature. It might help to describe a problem or use case, plus any alternatives that you have considered. + validations: + required: true diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index d832fa1d..89c37cb4 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -10,16 +10,15 @@ Remember that PRs should be made against the dev branch, unless you're preparing Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md) --> - ## PR checklist - [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! - - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md) - - [ ] If necessary, also make a PR on the nf-core/mag _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/mag _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. - [ ] Make sure your code lints (`nf-core lint`). -- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker --outdir `). - [ ] Usage Documentation in `docs/usage.md` is updated. - [ ] Output Documentation in `docs/output.md` is updated. - [ ] `CHANGELOG.md` is updated. diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index db2564d1..8aebadc5 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -14,17 +14,14 @@ jobs: runs-on: ubuntu-latest steps: - name: Launch workflow via tower - uses: nf-core/tower-action@master - + uses: nf-core/tower-action@v3 with: workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} - bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + access_token: ${{ secrets.TOWER_ACCESS_TOKEN }} compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} - pipeline: ${{ github.repository }} - revision: ${{ github.sha }} workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/mag/work-${{ github.sha }} parameters: | { "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/mag/results-${{ github.sha }}" } - profiles: '[ "test_full", "aws_tower" ]' + profiles: test_full,aws_tower diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 4e3797f9..e5f69c5d 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -10,19 +10,16 @@ jobs: if: github.repository == 'nf-core/mag' runs-on: ubuntu-latest steps: + # Launch workflow using Tower CLI tool action - name: Launch workflow via tower - uses: nf-core/tower-action@master - + uses: nf-core/tower-action@v3 with: workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }} - bearer_token: ${{ secrets.TOWER_BEARER_TOKEN }} + access_token: ${{ secrets.TOWER_ACCESS_TOKEN }} compute_env: ${{ secrets.TOWER_COMPUTE_ENV }} - pipeline: ${{ github.repository }} - revision: ${{ github.sha }} workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/mag/work-${{ github.sha }} parameters: | { - "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/mag/results-${{ github.sha }}" + "outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/mag/results-test-${{ github.sha }}" } - profiles: '[ "test", "aws_tower" ]' - + profiles: test,aws_tower diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index d5ee9bae..e08454a2 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -15,7 +15,6 @@ jobs: run: | { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/mag ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] - # If the above check failed, post a comment on the PR explaining the failure # NOTE - this doesn't currently work if the PR is coming from a fork, due to limitations in GitHub actions secrets - name: Post PR comment @@ -43,4 +42,3 @@ jobs: Thanks again for your contribution! repo-token: ${{ secrets.GITHUB_TOKEN }} allow-repeats: false - diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 243b2464..ad9b2ea9 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -8,61 +8,71 @@ on: release: types: [published] -# Uncomment if we need an edge release of Nextflow again -# env: NXF_EDGE: 1 +env: + NXF_ANSI_LOG: false + CAPSULE_LOG: none jobs: test: - name: Run workflow tests + name: Run pipeline with test data # Only run on push if this is the nf-core dev branch (merged PRs) - if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }} + if: "${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }}" runs-on: ubuntu-latest - env: - NXF_VER: ${{ matrix.nxf_ver }} - NXF_ANSI_LOG: false strategy: matrix: - # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['21.04.0', ''] + # Nextflow versions + include: + # Test pipeline minimum Nextflow version + - NXF_VER: "21.10.3" + NXF_EDGE: "" + # Test latest edge release of Nextflow + - NXF_VER: "" + NXF_EDGE: "1" steps: - name: Check out pipeline code uses: actions/checkout@v2 - name: Install Nextflow env: - CAPSULE_LOG: none + NXF_VER: ${{ matrix.NXF_VER }} + # Uncomment only if the edge release is more recent than the latest stable release + # See https://github.com/nextflow-io/nextflow/issues/2467 + # NXF_EDGE: ${{ matrix.NXF_EDGE }} run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: Run pipeline with test data run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker - + nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results profiles: name: Run workflow profile # Only run on push if this is the nf-core dev branch (merged PRs) if: ${{ github.event_name != 'push' || (github.event_name == 'push' && github.repository == 'nf-core/mag') }} runs-on: ubuntu-latest - env: - NXF_VER: '21.04.0' - NXF_ANSI_LOG: false strategy: matrix: # Run remaining test profiles with minimum nextflow version - profile: [test_host_rm, test_hybrid, test_hybrid_host_rm, test_busco_auto] + profile: + [ + test_host_rm, + test_hybrid, + test_hybrid_host_rm, + test_busco_auto, + test_ancient_dna, + test_adapterremoval, + test_binrefinement, + ] steps: - name: Check out pipeline code uses: actions/checkout@v2 - name: Install Nextflow - env: - CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: Run pipeline with ${{ matrix.profile }} test profile run: | - nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker + nextflow run ${GITHUB_WORKSPACE} -profile ${{ matrix.profile }},docker --outdir ./results diff --git a/.github/workflows/fix-linting.yml b/.github/workflows/fix-linting.yml new file mode 100644 index 00000000..eca5edd4 --- /dev/null +++ b/.github/workflows/fix-linting.yml @@ -0,0 +1,55 @@ +name: Fix linting from a comment +on: + issue_comment: + types: [created] + +jobs: + deploy: + # Only run if comment is on a PR with the main repo, and if it contains the magic keywords + if: > + contains(github.event.comment.html_url, '/pull/') && + contains(github.event.comment.body, '@nf-core-bot fix linting') && + github.repository == 'nf-core/mag' + runs-on: ubuntu-latest + steps: + # Use the @nf-core-bot token to check out so we can push later + - uses: actions/checkout@v3 + with: + token: ${{ secrets.nf_core_bot_auth_token }} + + # Action runs on the issue comment, so we don't get the PR by default + # Use the gh cli to check out the PR + - name: Checkout Pull Request + run: gh pr checkout ${{ github.event.issue.number }} + env: + GITHUB_TOKEN: ${{ secrets.nf_core_bot_auth_token }} + + - uses: actions/setup-node@v2 + + - name: Install Prettier + run: npm install -g prettier @prettier/plugin-php + + # Check that we actually need to fix something + - name: Run 'prettier --check' + id: prettier_status + run: | + if prettier --check ${GITHUB_WORKSPACE}; then + echo "::set-output name=result::pass" + else + echo "::set-output name=result::fail" + fi + + - name: Run 'prettier --write' + if: steps.prettier_status.outputs.result == 'fail' + run: prettier --write ${GITHUB_WORKSPACE} + + - name: Commit & push changes + if: steps.prettier_status.outputs.result == 'fail' + run: | + git config user.email "core@nf-co.re" + git config user.name "nf-core-bot" + git config push.default upstream + git add . + git status + git commit -m "[automated] Fix linting with Prettier" + git push diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 3b448773..77358dee 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -1,6 +1,7 @@ name: nf-core linting # This workflow is triggered on pushes and PRs to the repository. -# It runs the `nf-core lint` and markdown lint tests to ensure that the code meets the nf-core guidelines +# It runs the `nf-core lint` and markdown lint tests to ensure +# that the code meets the nf-core guidelines. on: push: pull_request: @@ -8,100 +9,35 @@ on: types: [published] jobs: - Markdown: - runs-on: ubuntu-latest - steps: - - uses: actions/checkout@v2 - - uses: actions/setup-node@v1 - with: - node-version: '10' - - name: Install markdownlint - run: npm install -g markdownlint-cli - - name: Run Markdownlint - run: markdownlint . - - # If the above check failed, post a comment on the PR explaining the failure - - name: Post PR comment - if: failure() - uses: mshick/add-pr-comment@v1 - with: - message: | - ## Markdown linting is failing - - To keep the code consistent with lots of contributors, we run automated code consistency checks. - To fix this CI test, please run: - - * Install `markdownlint-cli` - * On Mac: `brew install markdownlint-cli` - * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`) - * Fix the markdown errors - * Automatically: `markdownlint . --fix` - * Manually resolve anything left from `markdownlint .` - - Once you push these changes the test should pass, and you can hide this comment :+1: - - We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! - - Thanks again for your contribution! - repo-token: ${{ secrets.GITHUB_TOKEN }} - allow-repeats: false - EditorConfig: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - - uses: actions/setup-node@v1 - with: - node-version: '10' + - uses: actions/setup-node@v2 - name: Install editorconfig-checker run: npm install -g editorconfig-checker - name: Run ECLint check - run: editorconfig-checker -exclude README.md $(git ls-files | grep -v test) + run: editorconfig-checker -exclude README.md $(find .* -type f | grep -v '.git\|.py\|.md\|json\|yml\|yaml\|html\|css\|work\|.nextflow\|build\|nf_core.egg-info\|log.txt\|Makefile') - YAML: + Prettier: runs-on: ubuntu-latest steps: - - uses: actions/checkout@v1 - - uses: actions/setup-node@v1 - with: - node-version: '10' - - name: Install yaml-lint - run: npm install -g yaml-lint - - name: Run yaml-lint - run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml") - - # If the above check failed, post a comment on the PR explaining the failure - - name: Post PR comment - if: failure() - uses: mshick/add-pr-comment@v1 - with: - message: | - ## YAML linting is failing - - To keep the code consistent with lots of contributors, we run automated code consistency checks. - To fix this CI test, please run: - - * Install `yaml-lint` - * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`) - * Fix the markdown errors - * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")` - * Fix any reported errors in your YAML files + - uses: actions/checkout@v2 - Once you push these changes the test should pass, and you can hide this comment :+1: + - uses: actions/setup-node@v2 - We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + - name: Install Prettier + run: npm install -g prettier - Thanks again for your contribution! - repo-token: ${{ secrets.GITHUB_TOKEN }} - allow-repeats: false + - name: Run Prettier --check + run: prettier --check ${GITHUB_WORKSPACE} nf-core: runs-on: ubuntu-latest steps: - - name: Check out pipeline code uses: actions/checkout@v2 @@ -112,10 +48,10 @@ jobs: wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - - uses: actions/setup-python@v1 + - uses: actions/setup-python@v3 with: - python-version: '3.6' - architecture: 'x64' + python-version: "3.6" + architecture: "x64" - name: Install dependencies run: | @@ -142,4 +78,3 @@ jobs: lint_log.txt lint_results.md PR_number.txt - diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml index 90f03c6f..04758f61 100644 --- a/.github/workflows/linting_comment.yml +++ b/.github/workflows/linting_comment.yml @@ -1,4 +1,3 @@ - name: nf-core linting comment # This workflow is triggered after the linting action is complete # It posts an automated comment to the PR, even if the PR is coming from a fork @@ -15,6 +14,7 @@ jobs: uses: dawidd6/action-download-artifact@v2 with: workflow: linting.yml + workflow_conclusion: completed - name: Get PR number id: pr_number @@ -26,4 +26,3 @@ jobs: GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} number: ${{ steps.pr_number.outputs.pr_number }} path: linting-logs/lint_results.md - diff --git a/.gitpod.yml b/.gitpod.yml new file mode 100644 index 00000000..85d95ecc --- /dev/null +++ b/.gitpod.yml @@ -0,0 +1,14 @@ +image: nfcore/gitpod:latest + +vscode: + extensions: # based on nf-core.nf-core-extensionpack + - codezombiech.gitignore # Language support for .gitignore files + # - cssho.vscode-svgviewer # SVG viewer + - esbenp.prettier-vscode # Markdown/CommonMark linting and style checking for Visual Studio Code + - eamodio.gitlens # Quickly glimpse into whom, why, and when a line or code block was changed + - EditorConfig.EditorConfig # override user/workspace settings with settings found in .editorconfig files + - Gruntfuggly.todo-tree # Display TODO and FIXME in a tree view in the activity bar + - mechatroner.rainbow-csv # Highlight columns in csv files in different colors + # - nextflow.nextflow # Nextflow syntax highlighting + - oderwat.indent-rainbow # Highlight indentation level + - streetsidesoftware.code-spell-checker # Spelling checker for source code diff --git a/.markdownlint.yml b/.markdownlint.yml deleted file mode 100644 index 9e605fcf..00000000 --- a/.markdownlint.yml +++ /dev/null @@ -1,14 +0,0 @@ -# Markdownlint configuration file -default: true -line-length: false -ul-indent: - indent: 4 -no-duplicate-header: - siblings_only: true -no-inline-html: - allowed_elements: - - img - - p - - kbd - - details - - summary diff --git a/.nf-core.yml b/.nf-core.yml index f6ffd239..feabf6ad 100644 --- a/.nf-core.yml +++ b/.nf-core.yml @@ -1,4 +1,5 @@ -# Remove when not needed anymore! +repository_type: pipeline + lint: files_unchanged: - lib/NfcoreTemplate.groovy diff --git a/.prettierignore b/.prettierignore new file mode 100644 index 00000000..d0e7ae58 --- /dev/null +++ b/.prettierignore @@ -0,0 +1,9 @@ +email_template.html +.nextflow* +work/ +data/ +results/ +.DS_Store +testing/ +testing* +*.pyc diff --git a/.prettierrc.yml b/.prettierrc.yml new file mode 100644 index 00000000..c81f9a76 --- /dev/null +++ b/.prettierrc.yml @@ -0,0 +1 @@ +printWidth: 120 diff --git a/CHANGELOG.md b/CHANGELOG.md index 14c488f1..c25942bd 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,39 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v2.2.0 - 2022/06/14 + +### `Added` + +- [#263](https://github.com/nf-core/mag/pull/263) - Restructure binning subworkflow in preparation for aDNA workflow and extended binning +- [#247](https://github.com/nf-core/mag/pull/247) - Add ancient DNA subworkflow +- [#263](https://github.com/nf-core/mag/pull/263) - Add MaxBin2 as second contig binning tool +- [#285](https://github.com/nf-core/mag/pull/285) - Add AdapterRemoval2 as an alternative read trimmer +- [#291](https://github.com/nf-core/mag/pull/291) - Add DAS Tool for bin refinement +- [#319](https://github.com/nf-core/mag/pull/319) - Activate pipeline-specific institutional nf-core/configs + +### `Changed` + +- [#269](https://github.com/nf-core/mag/pull/269),[#283](https://github.com/nf-core/mag/pull/283),[#289](https://github.com/nf-core/mag/pull/289),[#302](https://github.com/nf-core/mag/pull/302) - Update to nf-core 2.4 `TEMPLATE` +- [#286](https://github.com/nf-core/mag/pull/286) - Cite our publication instead of the preprint +- [#291](https://github.com/nf-core/mag/pull/291), [#299](https://github.com/nf-core/mag/pull/299) - Add extra results folder `GenomeBinning/depths/contigs` for `[assembler]-[sample/group]-depth.txt.gz`, and `GenomeBinning/depths/bins` for `bin_depths_summary.tsv` and `[assembler]-[binner]-[sample/group]-binDepths.heatmap.png` +- [#315](https://github.com/nf-core/mag/pull/315) - Replace base container for standard shell tools to fix problems with running on Google Cloud + +### `Fixed` + +- [#290](https://github.com/nf-core/mag/pull/290) - Fix caching of binning input +- [#305](https://github.com/nf-core/mag/pull/305) - Add missing Bowtie2 version for process `BOWTIE2_PHIX_REMOVAL_ALIGN` to `software_versions.yml` +- [#307](https://github.com/nf-core/mag/pull/307) - Fix retrieval of GTDB-Tk version (note about newer version caused error in `CUSTOM_DUMPSOFTWAREVERSIONS`) +- [#309](https://github.com/nf-core/mag/pull/309) - Fix publishing of BUSCO `busco_downloads/` folder, i.e. publish only when `--save_busco_reference` is specified +- [#321](https://github.com/nf-core/mag/pull/321) - Fix parameter processing in `BOWTIE2_REMOVAL_ALIGN` (which was erroneously for `BOWTIE2_PHIX_REMOVAL_ALIGN`) + +### `Dependencies` + +| Tool | Previous version | New version | +| ------- | ---------------- | ----------- | +| fastp | 0.20.1 | 0.23.2 | +| MultiQC | 1.11 | 1.12 | + ## v2.1.1 - 2021/11/25 ### `Added` diff --git a/CITATIONS.md b/CITATIONS.md index 60c06a28..cef607f0 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -10,80 +10,124 @@ ## Pipeline tools -* [Bowtie2](https:/dx.doi.org/10.1038/nmeth.1923) - > Langmead, B. and Salzberg, S. L. 2012 Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p. 357–359. doi: 10.1038/nmeth.1923. +- [AdapterRemoval2](https://doi.org/10.1186/) -* [Busco](https://doi.org/10.1007/978-1-4939-9173-0_14) - > Seppey, M., Manni, M., & Zdobnov, E. M. (2019). BUSCO: assessing genome assembly and annotation completeness. In Gene prediction (pp. 227-245). Humana, New York, NY. doi: 10.1007/978-1-4939-9173-0_14. + > Schubert, M., Lindgreen, S., and Orlando, L. 2016. "AdapterRemoval v2: Rapid Adapter Trimming, Identification, and Read Merging." BMC Research Notes 9 (February): 88. doi: 10.1186/s13104-016-1900-2 -* [CAT](https://doi.org/10.1186/s13059-019-1817-x) - > von Meijenfeldt, F. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H., & Dutilh, B. E. (2019). Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome biology, 20(1), 1-14. doi: 10.1186/s13059-019-1817-x. +- [BCFtools](https://doi.org/10.1093/gigascience/giab008) -* [Centrifuge](https://doi.org/10.1101/gr.210641.116) - > Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome research, 26(12), 1721-1729. doi: 10.1101/gr.210641.116. + > Danecek, Petr, et al. "Twelve years of SAMtools and BCFtools." Gigascience 10.2 (2021): giab008. doi: 10.1093/gigascience/giab008 -* [FastP](https://doi.org/10.1093/bioinformatics/bty560) - > Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. doi: 10.1093/bioinformatics/bty560. +- [Bowtie2](https:/dx.doi.org/10.1038/nmeth.1923) -* [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) + > Langmead, B. and Salzberg, S. L. 2012 Fast gapped-read alignment with Bowtie 2. Nature methods, 9(4), p. 357–359. doi: 10.1038/nmeth.1923. -* [Filtlong](https://github.com/rrwick/Filtlong) +- [Busco](https://doi.org/10.1007/978-1-4939-9173-0_14) -* [GTDB-Tk](https://doi.org/10.1093/bioinformatics/btz848) - > Chaumeil, P. A., Mussig, A. J., Hugenholtz, P., & Parks, D. H. (2020). GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics , 36(6), 1925–1927. doi: 10.1093/bioinformatics/btz848. + > Seppey, M., Manni, M., & Zdobnov, E. M. (2019). BUSCO: assessing genome assembly and annotation completeness. In Gene prediction (pp. 227-245). Humana, New York, NY. doi: 10.1007/978-1-4939-9173-0_14. -* [Kraken2](https://doi.org/10.1186/s13059-019-1891-0) - > Wood, D et al., 2019. Improved metagenomic analysis with Kraken 2. Genome Biology volume 20, Article number: 257. doi: 10.1186/s13059-019-1891-0. +- [CAT](https://doi.org/10.1186/s13059-019-1817-x) -* [Krona](https://doi.org/10.1186/1471-2105-12-385) - > Ondov, B. D., Bergman, N. H., & Phillippy, A. M. (2011). Interactive metagenomic visualization in a Web browser. BMC bioinformatics, 12(1), 1-10. doi: 10.1186/1471-2105-12-385. + > von Meijenfeldt, F. B., Arkhipova, K., Cambuy, D. D., Coutinho, F. H., & Dutilh, B. E. (2019). Robust taxonomic classification of uncharted microbial sequences and bins with CAT and BAT. Genome biology, 20(1), 1-14. doi: 10.1186/s13059-019-1817-x. -* [MEGAHIT](https://doi.org/10.1016/j.ymeth.2016.02.020) - > Li, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., ... & Lam, T. W. (2016). MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods, 102, 3-11. doi: 10.1016/j.ymeth.2016.02.020. +- [Centrifuge](https://doi.org/10.1101/gr.210641.116) -* [MetaBAT2](https://doi.org/10.7717/peerj.7359) - > Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 7, e7359. doi: 10.7717/peerj.7359. + > Kim, D., Song, L., Breitwieser, F. P., & Salzberg, S. L. (2016). Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome research, 26(12), 1721-1729. doi: 10.1101/gr.210641.116. -* [MultiQC](https://doi.org/10.1093/bioinformatics/btw354) - > Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: doi.org/10.1093/bioinformatics/btw354. +- [DAS Tool](https://doi.org/10.1038/s41564-018-0171-1) -* [NanoLyse](https://doi.org/10.1093/bioinformatics/bty149) - > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149. + > Sieber, C. M. K., et al. 2018. "Recovery of Genomes from Metagenomes via a Dereplication, Aggregation and Scoring Strategy." Nature Microbiology 3 (7): 836-43. doi: 10.1038/s41564-018-0171-1 -* [NanoPlot](https://doi.org/10.1093/bioinformatics/bty149) - > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149. +- [FastP](https://doi.org/10.1093/bioinformatics/bty560) -* [Porechop](https://github.com/rrwick/Porechop) + > Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics , 34(17), i884–i890. doi: 10.1093/bioinformatics/bty560. -* [Prodigal](https://pubmed.ncbi.nlm.nih.gov/20211023/) - > Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648. +- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) -* [Prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/) - > Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 Jul 15;30(14):2068-9. doi: 10.1093/bioinformatics/btu153. Epub 2014 Mar 18. PMID: 24642063. +- [Filtlong](https://github.com/rrwick/Filtlong) -* [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) - > Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. doi: 10.1093/bioinformatics/btp352. +- [Freebayes](https://arxiv.org/abs/1207.3907) -* [SPAdes](https://doi.org/10.1101/gr.213959.116) - > Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A. (2017). metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), 824-834. doi: 10.1101/gr.213959.116. + > Garrison E, Marth G. Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN] 2012 + +- [GTDB-Tk](https://doi.org/10.1093/bioinformatics/btz848) + + > Chaumeil, P. A., Mussig, A. J., Hugenholtz, P., & Parks, D. H. (2020). GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database. Bioinformatics , 36(6), 1925–1927. doi: 10.1093/bioinformatics/btz848. + +- [Kraken2](https://doi.org/10.1186/s13059-019-1891-0) + + > Wood, D et al., 2019. Improved metagenomic analysis with Kraken 2. Genome Biology volume 20, Article number: 257. doi: 10.1186/s13059-019-1891-0. + +- [Krona](https://doi.org/10.1186/1471-2105-12-385) + + > Ondov, B. D., Bergman, N. H., & Phillippy, A. M. (2011). Interactive metagenomic visualization in a Web browser. BMC bioinformatics, 12(1), 1-10. doi: 10.1186/1471-2105-12-385. + +- [MaxBin2](https://doi.org/10.1093/bioinformatics/btv638) + + > Yu-Wei, W., Simmons, B. A. & Singer, S. W. (2015) MaxBin 2.0: An Automated Binning Algorithm to Recover Genomes from Multiple Metagenomic Datasets. Bioinformatics 32 (4): 605–7. doi: 10.1093/bioinformatics/btv638. + +- [MEGAHIT](https://doi.org/10.1016/j.ymeth.2016.02.020) + + > Li, D., Luo, R., Liu, C. M., Leung, C. M., Ting, H. F., Sadakane, K., ... & Lam, T. W. (2016). MEGAHIT v1. 0: a fast and scalable metagenome assembler driven by advanced methodologies and community practices. Methods, 102, 3-11. doi: 10.1016/j.ymeth.2016.02.020. + +- [MetaBAT2](https://doi.org/10.7717/peerj.7359) + + > Kang, D. D., Li, F., Kirton, E., Thomas, A., Egan, R., An, H., & Wang, Z. (2019). MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies. PeerJ, 7, e7359. doi: 10.7717/peerj.7359. + +- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) + + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + +- [NanoLyse](https://doi.org/10.1093/bioinformatics/bty149) + + > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149. + +- [NanoPlot](https://doi.org/10.1093/bioinformatics/bty149) + + > De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666-2669. doi: 10.1093/bioinformatics/bty149. + +- [Porechop](https://github.com/rrwick/Porechop) + +- [Prodigal](https://pubmed.ncbi.nlm.nih.gov/20211023/) + + > Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648. + +- [Prokka](https://pubmed.ncbi.nlm.nih.gov/24642063/) + + > Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014 Jul 15;30(14):2068-9. doi: 10.1093/bioinformatics/btu153. Epub 2014 Mar 18. PMID: 24642063. + +- [PyDamage](https://doi.org/10.7717/peerj.11845) + + > Borry M, Hübner A, Rohrlach AB, Warinner C. 2021. PyDamage: automated ancient damage identification and estimation for contigs in ancient DNA de novo assembly. PeerJ 9:e11845 doi: 10.7717/peerj.11845 + +- [SAMtools](https://doi.org/10.1093/bioinformatics/btp352) + + > Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … 1000 Genome Project Data Processing Subgroup. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics , 25(16), 2078–2079. doi: 10.1093/bioinformatics/btp352. + +- [SPAdes](https://doi.org/10.1101/gr.213959.116) + + > Nurk, S., Meleshko, D., Korobeynikov, A., & Pevzner, P. A. (2017). metaSPAdes: a new versatile metagenomic assembler. Genome research, 27(5), 824-834. doi: 10.1101/gr.213959.116. ## Data -* [Full-size test data](https://doi.org/10.1038/s41587-019-0191-2) - > Bertrand, D., Shaw, J., Kalathiyappan, M., Ng, A. H. Q., Kumar, M. S., Li, C., ... & Nagarajan, N. (2019). Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nature biotechnology, 37(8), 937-944. doi: 10.1038/s41587-019-0191-2. +- [Full-size test data](https://doi.org/10.1038/s41587-019-0191-2) + > Bertrand, D., Shaw, J., Kalathiyappan, M., Ng, A. H. Q., Kumar, M. S., Li, C., ... & Nagarajan, N. (2019). Hybrid metagenomic assembly enables high-resolution analysis of resistance determinants and mobile elements in human microbiomes. Nature biotechnology, 37(8), 937-944. doi: 10.1038/s41587-019-0191-2. ## Software packaging/containerisation tools -* [Anaconda](https://anaconda.com) - > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. +- [Anaconda](https://anaconda.com) + + > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. + +- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) + + > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. -* [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) - > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. +- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) -* [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) - > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. + > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. -* [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) +- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) -* [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) - > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. +- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) + > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. diff --git a/README.md b/README.md index 895161ed..e666ce61 100644 --- a/README.md +++ b/README.md @@ -1,19 +1,20 @@ -# ![nf-core/mag](docs/images/nf-core-mag_logo.png) +# ![nf-core/mag](docs/images/nf-core-mag_logo_light.png#gh-light-mode-only) ![nf-core/mag](docs/images/nf-core-mag_logo_dark.png#gh-dark-mode-only) [![GitHub Actions CI Status](https://github.com/nf-core/mag/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/mag/actions?query=workflow%3A%22nf-core+CI%22) [![GitHub Actions Linting Status](https://github.com/nf-core/mag/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/mag/actions?query=workflow%3A%22nf-core+linting%22) -[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?labelColor=000000&logo=Amazon%20AWS)](https://nf-co.re/mag/results) -[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.3589527-1073c8?labelColor=000000)](https://doi.org/10.5281/zenodo.3589527) -[![Cite Preprint](https://img.shields.io/badge/Cite%20Us!-Cite%20Preprint-orange)](https://doi.org/10.1101/2021.08.29.458094) +[![AWS CI](https://img.shields.io/badge/CI%20tests-full%20size-FF9900?logo=Amazon%20AWS)](https://nf-co.re/mag/results) +[![Cite with Zenodo](http://img.shields.io/badge/DOI-10.5281/zenodo.3589527-1073c8)](https://doi.org/10.5281/zenodo.3589527) +[![Cite Publication](https://img.shields.io/badge/Cite%20Us!-Cite%20Publication-orange)](https://doi.org/10.1093/nargab/lqac007) -[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.04.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/) -[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/) -[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/) -[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/) +[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-%E2%89%A521.10.3-23aa62.svg)](https://www.nextflow.io/) +[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?logo=anaconda)](https://docs.conda.io/en/latest/) +[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?logo=docker)](https://www.docker.com/) +[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg)](https://sylabs.io/docs/) +[![Launch on Nextflow Tower](https://img.shields.io/badge/Launch%20%F0%9F%9A%80-Nextflow%20Tower-%234256e7)](https://tower.nf/launch?pipeline=https://github.com/nf-core/mag) -[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23mag-4A154B?labelColor=000000&logo=slack)](https://nfcore.slack.com/channels/mag) -[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?labelColor=000000&logo=twitter)](https://twitter.com/nf_core) -[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?labelColor=000000&logo=youtube)](https://www.youtube.com/c/nf-core) +[![Get help on Slack](http://img.shields.io/badge/slack-nf--core%20%23mag-4A154B?logo=slack)](https://nfcore.slack.com/channels/mag) +[![Follow on Twitter](http://img.shields.io/badge/twitter-%40nf__core-1DA1F2?logo=twitter)](https://twitter.com/nf_core) +[![Watch on YouTube](http://img.shields.io/badge/youtube-nf--core-FF0000?logo=youtube)](https://www.youtube.com/c/nf-core) ## Introduction @@ -32,41 +33,46 @@ On release, automated continuous integration tests run the pipeline on a full-si By default, the pipeline currently performs the following: it supports both short and long reads, quality trims the reads and adapters with [fastp](https://github.com/OpenGene/fastp) and [Porechop](https://github.com/rrwick/Porechop), and performs basic QC with [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). The pipeline then: -* assigns taxonomy to reads using [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [Kraken2](https://github.com/DerrickWood/kraken2/wiki) -* performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast) -* predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal) -* performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/) -* assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT) +- assigns taxonomy to reads using [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) and/or [Kraken2](https://github.com/DerrickWood/kraken2/wiki) +- performs assembly using [MEGAHIT](https://github.com/voutcn/megahit) and [SPAdes](http://cab.spbu.ru/software/spades/), and checks their quality using [Quast](http://quast.sourceforge.net/quast) +- (optionally) performs ancient DNA assembly validation using [PyDamage](https://github.com/maxibor/pydamage) and contig consensus sequence recalling with [Freebayes](https://github.com/freebayes/freebayes) and [BCFtools](http://samtools.github.io/bcftools/bcftools.html) +- predicts protein-coding genes for the assemblies using [Prodigal](https://github.com/hyattpd/Prodigal) +- performs metagenome binning using [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/) and/or with [MaxBin2](https://sourceforge.net/projects/maxbin2/), and checks the quality of the genome bins using [Busco](https://busco.ezlab.org/) +- optionally refines bins with [DAS Tool](https://github.com/cmks/DAS_Tool) +- assigns taxonomy to bins using [GTDB-Tk](https://github.com/Ecogenomics/GTDBTk) and/or [CAT](https://github.com/dutilh/CAT) Furthermore, the pipeline creates various reports in the results directory specified, including a [MultiQC](https://multiqc.info/) report summarizing some of the findings and software versions. ## Quick Start -1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.04.0`) +1. Install [`Nextflow`](https://www.nextflow.io/docs/latest/getstarted.html#installation) (`>=21.10.3`) -2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ +2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) (you can follow [this tutorial](https://singularity-tutorial.github.io/01-installation/)), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(you can use [`Conda`](https://conda.io/miniconda.html) both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_. 3. Download the pipeline and test it on a minimal dataset with a single command: - ```console - nextflow run nf-core/mag -profile test, - ``` + ```console + nextflow run nf-core/mag -profile test,YOURPROFILE --outdir + ``` - > * Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. - > * If you are using `singularity` then the pipeline will auto-detect this and attempt to download the Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Singularity images directly due to timeout or network issues then please use the `--singularity_pull_docker_container` parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to pre-download all of the required containers before running the pipeline and to set the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options to be able to store and re-use the images from a central location for future pipeline runs. - > * If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. + Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (`YOURPROFILE` in the example command above). You can chain multiple config profiles in a comma-separated string. + + > - The pipeline comes with config profiles called `docker`, `singularity`, `podman`, `shifter`, `charliecloud` and `conda` which instruct the pipeline to use the named tool for software management. For example, `-profile test,docker`. + > - Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use `-profile ` in your command. This will enable either `docker` or `singularity` and set the appropriate execution settings for your local compute environment. + > - If you are using `singularity`, please use the [`nf-core download`](https://nf-co.re/tools/#downloading-pipelines-for-offline-use) command to download images first, before running the pipeline. Setting the [`NXF_SINGULARITY_CACHEDIR` or `singularity.cacheDir`](https://www.nextflow.io/docs/latest/singularity.html?#singularity-docker-hub) Nextflow options enables you to store and re-use the images from a central location for future pipeline runs. + > - If you are using `conda`, it is highly recommended to use the [`NXF_CONDA_CACHEDIR` or `conda.cacheDir`](https://www.nextflow.io/docs/latest/conda.html) settings to store the environments in a central location for future pipeline runs. 4. Start running your own analysis! - ```console - nextflow run nf-core/mag -profile --input '*_R{1,2}.fastq.gz' - ``` + ```console + nextflow run nf-core/mag -profile --input '*_R{1,2}.fastq.gz' --outdir + ``` - or + or - ```console - nextflow run nf-core/mag -profile --input samplesheet.csv - ``` + ```console + nextflow run nf-core/mag -profile --input samplesheet.csv --outdir + ``` See [usage docs](https://nf-co.re/mag/usage) and [parameter docs](https://nf-co.re/mag/parameters) for all of the available options when running the pipeline. @@ -82,21 +88,21 @@ When group-wise co-assembly is enabled, `SPAdes` is run on accordingly pooled re ## Credits -nf-core/mag was written by [Hadrien Gourlé](https://hadriengourle.com) at [SLU](https://slu.se), [Daniel Straub](https://github.com/d4straub) and [Sabrina Krakau](https://github.com/skrakau) at the [Quantitative Biology Center (QBiC)](http://qbic.life). +nf-core/mag was written by [Hadrien Gourlé](https://hadriengourle.com) at [SLU](https://slu.se), [Daniel Straub](https://github.com/d4straub) and [Sabrina Krakau](https://github.com/skrakau) at the [Quantitative Biology Center (QBiC)](http://qbic.life). [James A. Fellows Yates](https://github.com/jfy133) and [Maxime Borry](https://github.com/maxibor) at the [Max Planck Institute for Evolutionary Anthropology](https://www.eva.mpg.de) joined in version 2.2.0. Long read processing was inspired by [caspargross/HybridAssembly](https://github.com/caspargross/HybridAssembly) written by Caspar Gross [@caspargross](https://github.com/caspargross) We thank the following people for their extensive assistance in the development of this pipeline: -* [Alexander Peltzer](https://github.com/apeltzer) -* [Antonia Schuster](https://github.com/antoniaschuster) -* [Phil Ewels](https://github.com/ewels) -* [Gisela Gabernet](https://github.com/ggabernet) -* [Harshil Patel](https://github.com/drpatelh) -* [Johannes Alneberg](https://github.com/alneberg) -* [Maxime Borry](https://github.com/maxibor) -* [Maxime Garcia](https://github.com/MaxUlysse) -* [Michael L Heuer](https://github.com/heuermh) +- [Alexander Peltzer](https://github.com/apeltzer) +- [Antonia Schuster](https://github.com/antoniaschuster) +- [Phil Ewels](https://github.com/ewels) +- [Gisela Gabernet](https://github.com/ggabernet) +- [Harshil Patel](https://github.com/drpatelh) +- [Johannes Alneberg](https://github.com/alneberg) +- [Maxime Borry](https://github.com/maxibor) +- [Maxime Garcia](https://github.com/MaxUlysse) +- [Michael L Heuer](https://github.com/heuermh) ## Contributions and Support @@ -112,7 +118,7 @@ If you use nf-core/mag for your analysis, please cite the preprint as follows: > > Sabrina Krakau, Daniel Straub, Hadrien Gourlé, Gisela Gabernet, Sven Nahnsen. > -> bioRxiv 2021.08.29.458094. doi: [10.1101/2021.08.29.458094](https://doi.org/10.1101/2021.08.29.458094). +> NAR Genom Bioinform. 2022 Feb 2;4(1):lqac007. doi: [10.1093/nargab/lqac007](https://doi.org/10.1093/nargab/lqac007). additionally you can cite the pipeline directly with the following doi: [10.5281/zenodo.3589527](https://doi.org/10.5281/zenodo.3589527) diff --git a/assets/multiqc_config.yaml b/assets/multiqc_config.yaml deleted file mode 100644 index 3d7f0d41..00000000 --- a/assets/multiqc_config.yaml +++ /dev/null @@ -1,47 +0,0 @@ -report_comment: > - This report has been generated by the nf-core/mag - analysis pipeline. For information about how to interpret these results, please see the - documentation. -report_section_order: - software_versions: - order: -1000 - nf-core-mag-summary: - order: -1001 - -export_plots: true - -data_format: 'yaml' - -top_modules: -- 'fastqc': - name: 'FastQC: raw reads' - path_filters_exclude: - - '*trimmed*' -- custom_content -- 'fastqc': - name: 'FastQC: after preprocessing' - info: 'After trimming and, if requested, contamination removal.' - path_filters: - - '*trimmed*' -- 'busco': - info: 'assesses genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs. In case BUSCO''s automated lineage selection was used, only generic results for the selected domain are shown and only for genome bins and kept, unbinned contigs for which the BUSCO analysis was successfull, i.e. not for contigs for which no BUSCO genes could be found. Bins for which a specific virus lineage was selected are also not shown.' -- 'quast' - - -custom_data: - host_removal: - file_format: 'tsv' - section_name: 'Bowtie 2: host read removal' - description: 'Reads are mapped against the host reference sequence. Only reads that do not align (concordantly) are kept for further analysis.' - plot_type: 'bargraph' - pconfig: - id: 'host_removal_bargraph' - title: 'Bowtie 2: reads mapped against host reference' - ylab: '# Reads' - -sp: - host_removal: - fn: 'host_removal_metrics.tsv' - -extra_fn_clean_exts: - - '.bowtie2' diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml new file mode 100644 index 00000000..3b5da468 --- /dev/null +++ b/assets/multiqc_config.yml @@ -0,0 +1,51 @@ +report_comment: > + This report has been generated by the nf-core/mag + analysis pipeline. For information about how to interpret these results, please see the + documentation. +report_section_order: + software_versions: + order: -1000 + nf-core-mag-summary: + order: -1001 + +export_plots: true + +data_format: "yaml" + +top_modules: + - "fastqc": + name: "FastQC: raw reads" + path_filters_exclude: + - "*trimmed*" + - "fastp" + - "adapterRemoval": + - custom_content + - "fastqc": + name: "FastQC: after preprocessing" + info: "After trimming and, if requested, contamination removal." + path_filters: + - "*trimmed*" + - "busco": + info: "assesses genome assembly and annotation completeness with Benchmarking Universal Single-Copy Orthologs. In case BUSCO's automated lineage selection was used, only generic results for the selected domain are shown and only for genome bins and kept, unbinned contigs for which the BUSCO analysis was successfull, i.e. not for contigs for which no BUSCO genes could be found. Bins for which a specific virus lineage was selected are also not shown." + - "quast" + +custom_data: + host_removal: + file_format: "tsv" + section_name: "Bowtie 2: host read removal" + description: "Reads are mapped against the host reference sequence. Only reads that do not align (concordantly) are kept for further analysis." + plot_type: "bargraph" + pconfig: + id: "host_removal_bargraph" + title: "Bowtie 2: reads mapped against host reference" + ylab: "# Reads" + +sp: + host_removal: + fn: "host_removal_metrics.tsv" + adapterRemoval: + fn: "*_ar2_*.log" + +extra_fn_clean_exts: + - ".bowtie2" + - "_ar2" diff --git a/assets/nf-core-mag_logo.png b/assets/nf-core-mag_logo.png deleted file mode 100644 index 26f12176..00000000 Binary files a/assets/nf-core-mag_logo.png and /dev/null differ diff --git a/assets/nf-core-mag_logo_light.png b/assets/nf-core-mag_logo_light.png new file mode 100644 index 00000000..26d7ed5d Binary files /dev/null and b/assets/nf-core-mag_logo_light.png differ diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index faf9a292..34191548 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -12,9 +12,9 @@ $email_html Content-Type: image/png;name="nf-core-mag_logo.png" Content-Transfer-Encoding: base64 Content-ID: -Content-Disposition: inline; filename="nf-core-mag_logo.png" +Content-Disposition: inline; filename="nf-core-mag_logo_light.png" -<% out << new File("$projectDir/assets/nf-core-mag_logo.png"). +<% out << new File("$projectDir/assets/nf-core-mag_logo_light.png"). bytes. encodeBase64(). toString(). diff --git a/bin/get_mag_depths.py b/bin/get_mag_depths.py index 2ade6ec6..64418068 100755 --- a/bin/get_mag_depths.py +++ b/bin/get_mag_depths.py @@ -15,9 +15,11 @@ def parse_args(args=None): parser = argparse.ArgumentParser() parser.add_argument('-b', '--bins' , required=True, nargs="+", metavar='FILE' , help="Bins: FASTA containing all contigs.") parser.add_argument('-d', '--depths' , required=True , metavar='FILE' , help="(Compressed) TSV file containing contig depths for each sample: contigName, contigLen, totalAvgDepth, sample1_avgDepth, sample1_var [, sample2_avgDepth, sample2_var, ...].") - parser.add_argument('-a', '--assembly_name', required=True , type=str , help="Assembly name.") - parser.add_argument('-o', "--out" , required=True , metavar='FILE', type=argparse.FileType('w'), help="Output file containing depth for each bin.") + parser.add_argument('-a', '--assembler' , required=True , type=str , help="Assembler name.") + parser.add_argument('-i', '--id' , required=True , type=str , help="Sample or group id.") + parser.add_argument('-m', '--binner' , required=True , type=str , help="Binning method.") return parser.parse_args(args) +# Processing contig depths for each binner again, i.e. not the most efficient way, but ok def main(args=None): args = parse_args(args) @@ -31,8 +33,8 @@ def main(args=None): header = next(reader) for sample in range(int((len(header)-3)/2)): col_name = header[3+2*sample] - # retrieve sample name: "-.bam" - sample_name = col_name[len(args.assembly_name)+1:-4] + # retrieve sample name: "--.bam" + sample_name = col_name[len(args.assembler)+1+len(args.id)+1:-4] sample_names.append(sample_name) # process contig depths for row in reader: @@ -41,17 +43,31 @@ def main(args=None): contig_depths.append(float(row[3+2*sample])) dict_contig_depths[str(row[0])] = contig_depths + # Initialize output files n_samples = len(sample_names) + with open(args.assembler + "-" + args.binner + "-" + args.id + "-binDepths.tsv", 'w') as outfile: + print("bin", '\t'.join(sample_names), sep='\t', file=outfile) + # for each bin, access contig depths and compute mean bin depth (for all samples) - print("bin", '\t'.join(sample_names), sep='\t', file=args.out) for file in args.bins: all_depths = [[] for i in range(n_samples)] - with open(file, "rt") as infile: - for rec in SeqIO.parse(infile,'fasta'): - contig_depths = dict_contig_depths[rec.id] - for sample in range(n_samples): - all_depths[sample].append(contig_depths[sample]) - print(os.path.basename(file), '\t'.join(str(statistics.median(sample_depths)) for sample_depths in all_depths), sep='\t', file=args.out) + + if file.endswith('.gz'): + with gzip.open(file, 'rt') as infile: + for rec in SeqIO.parse(infile,'fasta'): + contig_depths = dict_contig_depths[rec.id] + for sample in range(n_samples): + all_depths[sample].append(contig_depths[sample]) + else: + with open(file, "rt") as infile: + for rec in SeqIO.parse(infile,'fasta'): + contig_depths = dict_contig_depths[rec.id] + for sample in range(n_samples): + all_depths[sample].append(contig_depths[sample]) + + binname = os.path.basename(file) + with open(args.assembler + "-" + args.binner + "-" + args.id + "-binDepths.tsv", 'a') as outfile: + print(binname, '\t'.join(str(statistics.median(sample_depths)) for sample_depths in all_depths), sep='\t', file=outfile) if __name__ == "__main__": diff --git a/bin/get_mag_depths_summary.py b/bin/get_mag_depths_summary.py index da5207d4..e70e640e 100755 --- a/bin/get_mag_depths_summary.py +++ b/bin/get_mag_depths_summary.py @@ -7,8 +7,8 @@ def parse_args(args=None): parser = argparse.ArgumentParser() - parser.add_argument('-d', '--depths' , required=True, nargs="+", metavar='FILE' , help="TSV file for each assembly containing bin depths for samples: bin, sample1, ....") - parser.add_argument('-o', "--out" , required=True , metavar='FILE', type=argparse.FileType('w'), help="Output file containing depths for all assemblies and all samples.") + parser.add_argument('-d', '--depths' , required=True, nargs="+", metavar='FILE' , help="TSV file for each assembly and binning method containing bin depths for samples: bin, sample1, ....") + parser.add_argument('-o', "--out" , required=True , metavar='FILE', type=argparse.FileType('w'), help="Output file containing depths for all assemblies, binning methods and all samples.") return parser.parse_args(args) def main(args=None): diff --git a/bin/plot_mag_depths.py b/bin/plot_mag_depths.py index 20de079c..5e7bff24 100755 --- a/bin/plot_mag_depths.py +++ b/bin/plot_mag_depths.py @@ -11,7 +11,7 @@ def parse_args(args=None): parser = argparse.ArgumentParser() - parser.add_argument('-d', '--bin_depths' , required=True, metavar='FILE' , help="Bin depths file in TSV format (for one assembly): bin, sample1_depth, sample2_depth, ....") + parser.add_argument('-d', '--bin_depths' , required=True, metavar='FILE' , help="Bin depths file in TSV format (for one assembly and binning method): bin, sample1_depth, sample2_depth, ....") parser.add_argument('-g', '--groups' , required=True, metavar='FILE' , help="File in TSV format containing group information for samples: sample, group") parser.add_argument('-o', "--out" , required=True, metavar='FILE', type=str, help="Output file.") return parser.parse_args(args) diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py deleted file mode 100755 index 2da103e8..00000000 --- a/bin/scrape_software_versions.py +++ /dev/null @@ -1,36 +0,0 @@ -#!/usr/bin/env python -from __future__ import print_function -import os - -results = {} -version_files = [x for x in os.listdir(".") if x.endswith(".version.txt")] -for version_file in version_files: - - software = version_file.replace(".version.txt", "") - if software == "pipeline": - software = "nf-core/mag" - - with open(version_file) as fin: - version = fin.read().strip() - results[software] = version - -# Dump to YAML -print( - """ -id: 'software_versions' -section_name: 'nf-core/mag Software Versions' -section_href: 'https://github.com/nf-core/mag' -plot_type: 'html' -description: 'are collected at run time from the software output.' -data: | -
-""" -) -for k, v in sorted(results.items()): - print("
{}
{}
".format(k, v)) -print("
") - -# Write out as tsv file: -with open("software_versions.tsv", "w") as f: - for k, v in sorted(results.items()): - f.write("{}\t{}\n".format(k, v)) diff --git a/bin/split_fasta.py b/bin/split_fasta.py index e009e1eb..07d369a6 100755 --- a/bin/split_fasta.py +++ b/bin/split_fasta.py @@ -1,14 +1,16 @@ #!/usr/bin/env python -#USAGE: ./combine_tables.py <*.unbinned.fa> +#USAGE: ./split_fasta.py <*.unbinned.fa(.gz)> import pandas as pd +import gzip from sys import argv from Bio import SeqIO from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Alphabet import generic_dna import os +import re # Input input_file = argv[1] @@ -17,30 +19,51 @@ min_length_to_retain_contig = int(argv[4]) # Base name for file output -out_base = (os.path.splitext(input_file)[0]) +if input_file.endswith('.gz'): + rm_ext = input_file.replace(".gz", "") + out_base = out_base = re.sub(r'\.fasta$|\.fa$|\.fna$', '', rm_ext) +else: + out_base = re.sub(r'\.fasta$|\.fa$|\.fna$', '', input_file) # Data structures to separate and store sequences df_above_threshold = pd.DataFrame(columns=['id','seq','length']) pooled = [] remaining = [] -# Read file -with open(input_file) as f: - fasta_sequences = SeqIO.parse(f,'fasta') - - for fasta in fasta_sequences: - name, sequence = fasta.id, str(fasta.seq) - length = len(sequence) - - # store each sequence above threshold together with its length into df - if length >= length_threshold: - df_above_threshold = df_above_threshold.append({"id":name, "seq":sequence, "length":length}, ignore_index = True) - # contigs to retain and pool - elif length >= min_length_to_retain_contig: - pooled.append(SeqRecord(Seq(sequence, generic_dna), id = name)) - # remaining sequences - else: - remaining.append(SeqRecord(Seq(sequence, generic_dna), id = name)) +if input_file.endswith('.gz'): + with gzip.open(input_file, 'rt') as f: + fasta_sequences = SeqIO.parse(f,'fasta') + + for fasta in fasta_sequences: + name, sequence = fasta.id, str(fasta.seq) + length = len(sequence) + + # store each sequence above threshold together with its length into df + if length >= length_threshold: + df_above_threshold = df_above_threshold.append({"id":name, "seq":sequence, "length":length}, ignore_index = True) + # contigs to retain and pool + elif length >= min_length_to_retain_contig: + pooled.append(SeqRecord(Seq(sequence, generic_dna), id = name)) + # remaining sequences + else: + remaining.append(SeqRecord(Seq(sequence, generic_dna), id = name)) +else: + with open(input_file) as f: + fasta_sequences = SeqIO.parse(f,'fasta') + + for fasta in fasta_sequences: + name, sequence = fasta.id, str(fasta.seq) + length = len(sequence) + + # store each sequence above threshold together with its length into df + if length >= length_threshold: + df_above_threshold = df_above_threshold.append({"id":name, "seq":sequence, "length":length}, ignore_index = True) + # contigs to retain and pool + elif length >= min_length_to_retain_contig: + pooled.append(SeqRecord(Seq(sequence, generic_dna), id = name)) + # remaining sequences + else: + remaining.append(SeqRecord(Seq(sequence, generic_dna), id = name)) # Sort sequences above threshold by length df_above_threshold.sort_values(by=['length'], ascending=False, inplace=True) diff --git a/conf/base.config b/conf/base.config index cd177b76..d05a8912 100644 --- a/conf/base.config +++ b/conf/base.config @@ -1,7 +1,7 @@ /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nf-core/mag Nextflow base config file -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ A 'blank slate' config file, appropriate for general use on most high performance compute environments. Assumes that all software is installed and available on the PATH. Runs in `local` mode - all jobs will be run on the logged in environment. @@ -53,6 +53,10 @@ process { maxRetries = 2 } + withName:CUSTOM_DUMPSOFTWAREVERSIONS { + cache = false + } + withName: BOWTIE2_HOST_REMOVAL_BUILD { cpus = { check_max (10 * task.attempt, 'cpus' ) } memory = { check_max (20.GB * task.attempt, 'memory' ) } @@ -143,7 +147,7 @@ process { time = { check_max (8.h * task.attempt, 'time' ) } errorStrategy = { task.exitStatus in [143,137,104,134,139,247] ? 'retry' : 'finish' } } - withName: METABAT2 { + withName: METABAT2_METABAT2 { cpus = { check_max (8 * task.attempt, 'cpus' ) } memory = { check_max (20.GB * task.attempt, 'memory' ) } time = { check_max (8.h * task.attempt, 'time' ) } @@ -155,4 +159,14 @@ process { cpus = { check_max (8 * task.attempt, 'cpus' ) } memory = { check_max (20.GB * task.attempt, 'memory' ) } } + + withName: MAXBIN2 { + // often fails when insufficient information, so we allow it to gracefully fail without failing the pipeline + errorStrategy = { task.exitStatus in [ 1, 255 ] ? 'ignore' : 'retry' } + } + + withName: DASTOOL_DASTOOL { + // if SCGs not found, bins cannot be assigned and DAS_tool will die with exit status 1 + errorStrategy = { task.exitStatus in [143,137,104,134,139] ? 'retry' : task.exitStatus == 1 ? 'ignore' : 'finish' } + } } diff --git a/conf/igenomes.config b/conf/igenomes.config index 855948de..7a1b3ac6 100644 --- a/conf/igenomes.config +++ b/conf/igenomes.config @@ -1,7 +1,7 @@ /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Nextflow config file for iGenomes paths -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines reference genomes using iGenome paths. Can be used by any config that customises the base path using: $params.igenomes_base / --igenomes_base @@ -13,7 +13,7 @@ params { genomes { 'GRCh37' { fasta = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Homo_sapiens/Ensembl/GRCh37/Sequence/BismarkIndex/" @@ -26,7 +26,7 @@ params { } 'GRCh38' { fasta = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Homo_sapiens/NCBI/GRCh38/Sequence/BismarkIndex/" @@ -38,7 +38,7 @@ params { } 'GRCm38' { fasta = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Mus_musculus/Ensembl/GRCm38/Sequence/BismarkIndex/" @@ -51,7 +51,7 @@ params { } 'TAIR10' { fasta = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Arabidopsis_thaliana/Ensembl/TAIR10/Sequence/BismarkIndex/" @@ -62,7 +62,7 @@ params { } 'EB2' { fasta = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Bacillus_subtilis_168/Ensembl/EB2/Sequence/BismarkIndex/" @@ -72,7 +72,7 @@ params { } 'UMD3.1' { fasta = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Bos_taurus/Ensembl/UMD3.1/Sequence/BismarkIndex/" @@ -83,7 +83,7 @@ params { } 'WBcel235' { fasta = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Caenorhabditis_elegans/Ensembl/WBcel235/Sequence/BismarkIndex/" @@ -94,7 +94,7 @@ params { } 'CanFam3.1' { fasta = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Canis_familiaris/Ensembl/CanFam3.1/Sequence/BismarkIndex/" @@ -105,7 +105,7 @@ params { } 'GRCz10' { fasta = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Danio_rerio/Ensembl/GRCz10/Sequence/BismarkIndex/" @@ -115,7 +115,7 @@ params { } 'BDGP6' { fasta = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Drosophila_melanogaster/Ensembl/BDGP6/Sequence/BismarkIndex/" @@ -126,7 +126,7 @@ params { } 'EquCab2' { fasta = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Equus_caballus/Ensembl/EquCab2/Sequence/BismarkIndex/" @@ -137,7 +137,7 @@ params { } 'EB1' { fasta = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Escherichia_coli_K_12_DH10B/Ensembl/EB1/Sequence/BismarkIndex/" @@ -147,7 +147,7 @@ params { } 'Galgal4' { fasta = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Gallus_gallus/Ensembl/Galgal4/Sequence/BismarkIndex/" @@ -157,7 +157,7 @@ params { } 'Gm01' { fasta = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Glycine_max/Ensembl/Gm01/Sequence/BismarkIndex/" @@ -167,7 +167,7 @@ params { } 'Mmul_1' { fasta = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Macaca_mulatta/Ensembl/Mmul_1/Sequence/BismarkIndex/" @@ -178,7 +178,7 @@ params { } 'IRGSP-1.0' { fasta = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Oryza_sativa_japonica/Ensembl/IRGSP-1.0/Sequence/BismarkIndex/" @@ -188,7 +188,7 @@ params { } 'CHIMP2.1.4' { fasta = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Pan_troglodytes/Ensembl/CHIMP2.1.4/Sequence/BismarkIndex/" @@ -199,7 +199,7 @@ params { } 'Rnor_5.0' { fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_5.0/Sequence/BismarkIndex/" @@ -209,7 +209,7 @@ params { } 'Rnor_6.0' { fasta = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Rattus_norvegicus/Ensembl/Rnor_6.0/Sequence/BismarkIndex/" @@ -219,7 +219,7 @@ params { } 'R64-1-1' { fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/BismarkIndex/" @@ -230,7 +230,7 @@ params { } 'EF2' { fasta = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Schizosaccharomyces_pombe/Ensembl/EF2/Sequence/BismarkIndex/" @@ -242,7 +242,7 @@ params { } 'Sbi1' { fasta = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Sorghum_bicolor/Ensembl/Sbi1/Sequence/BismarkIndex/" @@ -252,7 +252,7 @@ params { } 'Sscrofa10.2' { fasta = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Sus_scrofa/Ensembl/Sscrofa10.2/Sequence/BismarkIndex/" @@ -263,7 +263,7 @@ params { } 'AGPv3' { fasta = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Zea_mays/Ensembl/AGPv3/Sequence/BismarkIndex/" @@ -273,7 +273,7 @@ params { } 'hg38' { fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg38/Sequence/BismarkIndex/" @@ -285,7 +285,7 @@ params { } 'hg19' { fasta = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Homo_sapiens/UCSC/hg19/Sequence/BismarkIndex/" @@ -298,7 +298,7 @@ params { } 'mm10' { fasta = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Mus_musculus/UCSC/mm10/Sequence/BismarkIndex/" @@ -311,7 +311,7 @@ params { } 'bosTau8' { fasta = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Bos_taurus/UCSC/bosTau8/Sequence/BismarkIndex/" @@ -321,7 +321,7 @@ params { } 'ce10' { fasta = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Caenorhabditis_elegans/UCSC/ce10/Sequence/BismarkIndex/" @@ -333,7 +333,7 @@ params { } 'canFam3' { fasta = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Canis_familiaris/UCSC/canFam3/Sequence/BismarkIndex/" @@ -344,7 +344,7 @@ params { } 'danRer10' { fasta = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Danio_rerio/UCSC/danRer10/Sequence/BismarkIndex/" @@ -355,7 +355,7 @@ params { } 'dm6' { fasta = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Drosophila_melanogaster/UCSC/dm6/Sequence/BismarkIndex/" @@ -366,7 +366,7 @@ params { } 'equCab2' { fasta = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Equus_caballus/UCSC/equCab2/Sequence/BismarkIndex/" @@ -377,7 +377,7 @@ params { } 'galGal4' { fasta = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Gallus_gallus/UCSC/galGal4/Sequence/BismarkIndex/" @@ -388,7 +388,7 @@ params { } 'panTro4' { fasta = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Pan_troglodytes/UCSC/panTro4/Sequence/BismarkIndex/" @@ -399,7 +399,7 @@ params { } 'rn6' { fasta = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Rattus_norvegicus/UCSC/rn6/Sequence/BismarkIndex/" @@ -409,7 +409,7 @@ params { } 'sacCer3' { fasta = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Saccharomyces_cerevisiae/UCSC/sacCer3/Sequence/BismarkIndex/" @@ -419,7 +419,7 @@ params { } 'susScr3' { fasta = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/WholeGenomeFasta/genome.fa" - bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/genome.fa" + bwa = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BWAIndex/version0.6.0/" bowtie2 = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/Bowtie2Index/" star = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/STARIndex/" bismark = "${params.igenomes_base}/Sus_scrofa/UCSC/susScr3/Sequence/BismarkIndex/" diff --git a/conf/modules.config b/conf/modules.config index c02f1460..4051ec08 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -1,176 +1,513 @@ /* -======================================================================================== - Config file for defining DSL2 per module options -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Config file for defining DSL2 per module options and publishing paths +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Available keys to override module options: - args = Additional arguments appended to command in module. - args2 = Second set of arguments appended to command in module (multi-tool modules). - args3 = Third set of arguments appended to command in module (multi-tool modules). - publish_dir = Directory to publish results. - publish_by_meta = Groovy list of keys available in meta map to append as directories to "publish_dir" path - If publish_by_meta = true - Value of ${meta['id']} is appended as a directory to "publish_dir" path - If publish_by_meta = ['id', 'custompath'] - If "id" is in meta map and "custompath" isn't then "${meta['id']}/custompath/" - is appended as a directory to "publish_dir" path - If publish_by_meta = false / null - No directories are appended to "publish_dir" path - publish_files = Groovy map where key = "file_ext" and value = "directory" to publish results for that file extension - The value of "directory" is appended to the standard "publish_dir" path as defined above. - If publish_files = null (unspecified) - All files are published. - If publish_files = false - No files are published. - suffix = File name suffix for output files. + ext.args = Additional arguments appended to command in module. + ext.args2 = Second set of arguments appended to command in module (multi-tool modules). + ext.args3 = Third set of arguments appended to command in module (multi-tool modules). + ext.prefix = File name prefix for output files. ---------------------------------------------------------------------------------------- */ -params { - modules { - 'fastqc_raw' { - args = "--quiet" - publish_files = ['html':''] - publish_dir = "QC_shortreads/fastqc" - } - 'fastp' { - args = "-q ${params.fastp_qualified_quality} --cut_front --cut_tail --cut_mean_quality ${params.fastp_cut_mean_quality}" - publish_files = ['html':'', 'json':''] - publish_by_meta = true - publish_dir = "QC_shortreads/fastp" - } - 'bowtie2_host_removal_align' { - publish_files = ['log':'', 'read_ids.txt':''] - publish_dir = "QC_shortreads/remove_host" - suffix = "host_removed" - } - 'bowtie2_phix_removal_align' { - publish_files = ['log':''] - publish_dir = "QC_shortreads/remove_phix" - suffix = "phix_removed" - } - 'fastqc_trimmed' { - args = "--quiet" - publish_files = ['html':''] - publish_dir = "QC_shortreads/fastqc" - suffix = ".trimmed" - } - 'nanolyse' { - publish_files = ['log':''] - publish_dir = "QC_longreads/NanoLyse" - } - 'nanoplot_raw' { - publish_files = ['png':'', 'html':'', 'txt':''] - publish_by_meta = true - publish_dir = "QC_longreads/NanoPlot" - suffix = "raw" - } - 'nanoplot_filtered' { - publish_files = ['png':'', 'html':'', 'txt':''] - publish_by_meta = true - publish_dir = "QC_longreads/NanoPlot" - suffix = "filtered" - } - 'centrifuge' { - publish_files = ['txt':''] - publish_by_meta = true - publish_dir = "Taxonomy/centrifuge" - } - 'kraken2' { - publish_files = ['txt':''] - publish_by_meta = true - publish_dir = "Taxonomy/kraken2" - } - 'krona' { - publish_files = ['html':''] - publish_by_meta = true - publish_dir = "Taxonomy" - } - 'megahit' { - publish_files = ['fa.gz':'', 'log':''] - publish_dir = "Assembly" - } - 'spades' { - publish_files = ['fasta.gz':'', 'gfa.gz':'', 'log':''] - publish_dir = "Assembly/SPAdes" - } - 'spadeshybrid' { - publish_files = ['fasta.gz':'', 'gfa.gz':'', 'log':''] - publish_dir = "Assembly/SPAdesHybrid" - } - 'quast' { - publish_by_meta = ['assembler', 'QC', 'id'] - publish_dir = "Assembly" - } - 'bowtie2_assembly_align' { - publish_files = ['log':''] - publish_by_meta = ['assembler', 'QC', 'id'] - publish_dir = "Assembly" - } - 'metabat2' { - publish_files = ['txt.gz':'', 'fa':'', 'fa.gz':''] - publish_dir = "GenomeBinning" - } - 'mag_depths' { - publish_files = false - } - 'mag_depths_plot' { - publish_dir = "GenomeBinning" - } - 'mag_depths_summary' { - publish_dir = "GenomeBinning" - } - 'busco_db_preparation' { - publish_files = ['tar.gz':''] - publish_dir = "GenomeBinning/QC/BUSCO" - } - 'busco' { - publish_dir = "GenomeBinning/QC/BUSCO" - } - 'busco_save_download' { - publish_dir = "GenomeBinning/QC/BUSCO" - } - 'busco_plot' { - publish_dir = "GenomeBinning/QC/BUSCO" - } - 'busco_summary' { - publish_dir = "GenomeBinning/QC" - } - 'quast_bins' { - publish_dir = "GenomeBinning/QC" - } - 'quast_bins_summary' { - publish_dir = "GenomeBinning/QC" - } - 'cat' { - publish_by_meta = true - publish_dir = "Taxonomy/CAT" - publish_files = ['log':'', 'gz':''] - } - 'cat_db_generate' { - publish_files = ['tar.gz':''] - publish_dir = "Taxonomy/CAT" - } - 'gtdbtk_classify' { - args = "--extension fa" - publish_files = ['log':'', 'tsv':'', 'tree.gz':'', 'fasta':'', 'fasta.gz':''] - publish_by_meta = true - publish_dir = "Taxonomy/GTDB-Tk" - } - 'gtdbtk_summary' { - args = "--extension fa" - publish_dir = "Taxonomy/GTDB-Tk" - } - 'bin_summary' { - publish_dir = "GenomeBinning" - } - prokka { - args = "--metagenome" - publish_dir = "Prokka" - publish_by_meta = ['assembler'] - } - 'multiqc' { - args = "" - } - prodigal { - args = "-p meta" - publish_dir = "Prodigal" - output_format = "gff" - publish_by_meta = ['assembler', 'id'] - } +process { + + //default: do not publish into the results folder + publishDir = [ + path: { "${params.outdir}/${task.process.tokenize(':')[-1].tokenize('_')[0].toLowerCase()}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + enabled: false + ] + + withName: FASTQC_RAW { + ext.args = '--quiet' + publishDir = [ + path: { "${params.outdir}/QC_shortreads/fastqc" }, + mode: params.publish_dir_mode, + pattern: "*.html" + ] + } + + withName: FASTP { + ext.args = [ + "-q ${params.fastp_qualified_quality}", + "--cut_front", + "--cut_tail", + "--cut_mean_quality ${params.fastp_cut_mean_quality}", + "--length_required ${params.reads_minlength}" + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/QC_shortreads/fastp/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.{html,json}" + ] + } + + withName: ADAPTERREMOVAL_PE { + ext.args = [ + "--minlength ${params.reads_minlength}", + "--adapter1 ${params.adapterremoval_adapter1} --adapter2 ${params.adapterremoval_adapter2}", + "--minquality ${params.adapterremoval_minquality} --trimns", + params.adapterremoval_trim_quality_stretch ? "--trim_qualities" : "--trimwindows 4" + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/QC_shortreads/adapterremoval/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.{log}" + ] + ext.prefix = { "${meta.id}_ar2" } + } + + withName: ADAPTERREMOVAL_SE { + ext.args = [ + "--minlength ${params.reads_minlength}", + "--adapter1 ${params.adapterremoval_adapter1}", + "--minquality ${params.adapterremoval_minquality} --trimns", + params.adapterremoval_trim_quality_stretch ? "--trim_qualities" : "--trimwindows 4" + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/QC_shortreads/adapterremoval/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.{log}" + ] + ext.prefix = { "${meta.id}_ar2" } + } + + withName: BOWTIE2_PHIX_REMOVAL_ALIGN { + ext.prefix = { "${meta.id}.phix_removed" } + publishDir = [ + path: { "${params.outdir}/QC_shortreads/remove_phix" }, + mode: params.publish_dir_mode, + pattern: "*.log" + ] + } + + withName: BOWTIE2_HOST_REMOVAL_ALIGN { + ext.args = params.host_removal_verysensitive ? "--very-sensitive" : "--sensitive" + ext.args2 = params.host_removal_save_ids ? "--host_removal_save_ids" : '' + ext.prefix = { "${meta.id}.host_removed" } + publishDir = [ + path: { "${params.outdir}/QC_shortreads/remove_host" }, + mode: params.publish_dir_mode, + pattern: "*{.log,read_ids.txt}" + ] + } + + withName: FASTQC_TRIMMED { + ext.args = '--quiet' + ext.prefix = { "${meta.id}.trimmed" } + publishDir = [ + path: { "${params.outdir}/QC_shortreads/fastqc" }, + mode: params.publish_dir_mode, + pattern: "*.html" + ] + } + + withName: NANOLYSE { + publishDir = [ + path: { "${params.outdir}/QC_longreads/NanoLyse" }, + mode: params.publish_dir_mode, + pattern: "*.log" + ] + } + + withName: NANOPLOT_RAW { + ext.prefix = 'raw' + publishDir = [ + path: { "${params.outdir}/QC_longreads/NanoPlot/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.{png,html,txt}" + ] + } + + withName: NANOPLOT_FILTERED { + ext.prefix = 'filtered' + publishDir = [ + path: { "${params.outdir}/QC_longreads/NanoPlot/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.{png,html,txt}" + ] + } + + withName: CENTRIFUGE { + publishDir = [ + path: { "${params.outdir}/Taxonomy/centrifuge/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.txt" + ] + } + + withName: KRAKEN2 { + ext.args = '--quiet' + publishDir = [ + path: { "${params.outdir}/Taxonomy/kraken2/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.txt" + ] + } + + withName: KRONA { + publishDir = [ + path: { "${params.outdir}/Taxonomy/${meta.classifier}/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.html" + ] + } + + //pattern: "*.{fa.gz,log}" //'pattern' didnt work, probably because the output is in a folder, solved with 'saveAs' + withName: MEGAHIT { + ext.args = params.megahit_options ?: '' + publishDir = [ + path: { "${params.outdir}/Assembly" }, + mode: params.publish_dir_mode, + saveAs: { + filename -> filename.equals('versions.yml') ? null : + filename.indexOf('.contigs.fa.gz') > 0 ? filename : + filename.indexOf('.log') > 0 ? filename : null } + ] + } + + withName: SPADES { + ext.args = params.spades_options ?: '' + publishDir = [ + path: { "${params.outdir}/Assembly/SPAdes" }, + mode: params.publish_dir_mode, + pattern: "*.{fasta.gz,gfa.gz,log}" + ] + } + + withName: SPADESHYBRID { + ext.args = params.spades_options ?: '' + publishDir = [ + path: { "${params.outdir}/Assembly/SPAdesHybrid" }, + mode: params.publish_dir_mode, + pattern: "*.{fasta.gz,gfa.gz,log}" + ] + } + + withName: QUAST { + publishDir = [ + path: { "${params.outdir}/Assembly/${meta.assembler}/QC/${meta.id}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: BOWTIE2_ASSEMBLY_ALIGN { + ext.args = params.bowtie2_mode ? params.bowtie2_mode : params.ancient_dna ? '--very-sensitive-local -N 1' : '' + publishDir = [ + path: { "${params.outdir}/Assembly/${assembly_meta.assembler}/QC/${assembly_meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.log" + ] + } + + withName: 'MAG_DEPTHS_PLOT|MAG_DEPTHS_SUMMARY|MAG_DEPTHS_PLOT_REFINED' { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/depths/bins" }, + mode: params.publish_dir_mode, + pattern: "*.{png,tsv}" + ] + } + + withName: 'MAG_DEPTHS_SUMMARY_REFINED' { + ext.prefix = "bin_refined_depths_summary" + publishDir = [ + path: { "${params.outdir}/GenomeBinning/depths/bins" }, + mode: params.publish_dir_mode, + pattern: "*.{tsv}" + ] + } + + withName: 'BIN_SUMMARY' { + publishDir = [ + path: { "${params.outdir}/GenomeBinning" }, + mode: params.publish_dir_mode, + pattern: "*.{png,tsv}" + ] + } + + withName: BUSCO_DB_PREPARATION { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, + mode: params.publish_dir_mode, + pattern: "*.tar.gz" + ] + } + + withName: 'BUSCO' { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, + mode: params.publish_dir_mode, + pattern: "*.{log,err,faa.gz,fna.gz,gff,txt}" + ] + } + + withName: BUSCO_SAVE_DOWNLOAD { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, + mode: params.publish_dir_mode, + overwrite: false, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: 'BUSCO_PLOT' { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/QC/BUSCO" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] } + + withName: 'BUSCO_SUMMARY|QUAST_BINS|QUAST_BINS_SUMMARY' { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/QC" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: CAT_DB_GENERATE { + publishDir = [ + path: { "${params.outdir}/Taxonomy/CAT" }, + mode: params.publish_dir_mode, + pattern: "*.tar.gz" + ] + } + + withName: CAT { + publishDir = [ + path: { "${params.outdir}/Taxonomy/CAT/${meta.assembler}/${meta.binner}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: GTDBTK_CLASSIFY { + ext.args = "--extension fa" + publishDir = [ + path: { "${params.outdir}/Taxonomy/GTDB-Tk/${meta.assembler}/${meta.binner}/${meta.id}" }, + mode: params.publish_dir_mode, + pattern: "*.{log,tasv,tree.gz,fasta,fasta.gz}" + ] + } + + withName: GTDBTK_SUMMARY { + ext.args = "--extension fa" + publishDir = [ + path: { "${params.outdir}/Taxonomy/GTDB-Tk" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: PROKKA { + ext.args = "--metagenome" + publishDir = [ + path: { "${params.outdir}/Prokka/${meta.assembler}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: MULTIQC { + ext.args = "" + publishDir = [ + path: { "${params.outdir}/multiqc" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: PRODIGAL { + ext.args = "-p meta" + publishDir = [ + path: { "${params.outdir}/Prodigal/${meta.assembler}/${meta.id}" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename } + ] + } + + withName: FREEBAYES { + ext.args = "-p ${params.freebayes_ploidy} -q ${params.freebayes_min_basequality} -F ${params.freebayes_minallelefreq}" + publishDir = [ + path: { "${params.outdir}/Ancient_DNA/variant_calling/freebayes" }, + mode: params.publish_dir_mode, + pattern: "*.vcf.gz" + ] + } + + withName: BCFTOOLS_VIEW { + ext.args = "-v snps,mnps -i 'QUAL>=${params.bcftools_view_high_variant_quality} || (QUAL>=${params.bcftools_view_medium_variant_quality} && FORMAT/AO>=${params.bcftools_view_minimal_allelesupport})'" + ext.prefix = { "${meta.id}.filtered" } + publishDir = [ + path: { "${params.outdir}/Ancient_DNA/variant_calling/filtered" }, + mode: params.publish_dir_mode, + pattern: "*.vcf.gz" + ] + } + + withName: BCFTOOLS_CONSENSUS { + publishDir = [ + path: {"${params.outdir}/Ancient_DNA/variant_calling/consensus" }, + mode: params.publish_dir_mode, + pattern: "*.fa" + ] + } + + withName: BCFTOOLS_INDEX { + ext.args = "-t" + publishDir = [ + path: {"${params.outdir}/Ancient_DNA/variant_calling/index" }, + mode: params.publish_dir_mode, + enabled: false + ] + } + + withName: PYDAMAGE_ANALYZE { + publishDir = [ + path: {"${params.outdir}/Ancient_DNA/pydamage/analyze" }, + mode: params.publish_dir_mode + ] + } + + withName: PYDAMAGE_FILTER { + ext.args = "-t ${params.pydamage_accuracy}" + publishDir = [ + path: {"${params.outdir}/Ancient_DNA/pydamage/filter" }, + mode: params.publish_dir_mode + ] + } + + withName: SAMTOOLS_FAIDX { + publishDir = [ + path: {"${params.outdir}/Ancient_DNA/samtools/faidx" }, + mode: params.publish_dir_mode, + enabled: false + ] + } + withName: METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS { + publishDir = [ + path: { "${params.outdir}/GenomeBinning/depths/contigs" }, + mode: params.publish_dir_mode, + pattern: '*-depth.txt.gz' + ] + ext.prefix = { "${meta.assembler}-${meta.id}-depth" } + } + + withName: METABAT2_METABAT2 { + publishDir = [ + [ + path: { "${params.outdir}/GenomeBinning/MetaBAT2/" }, + mode: params.publish_dir_mode, + pattern: 'bins/*.fa.gz' + ], + [ + path: { "${params.outdir}/GenomeBinning/MetaBAT2/discarded" }, + mode: params.publish_dir_mode, + pattern: '*tooShort.fa.gz' + ], + [ + path: { "${params.outdir}/GenomeBinning/MetaBAT2/discarded" }, + mode: params.publish_dir_mode, + pattern: '*lowDepth.fa.gz' + ] + ] + ext.prefix = { "${meta.assembler}-MetaBAT2-${meta.id}" } + ext.args = [ + "-m ${params.min_contig_size}", + "--unbinned", + "--seed ${params.metabat_rng_seed}" + ].join(' ').trim() + } + + withName: MAXBIN2 { + publishDir = [ + [ + path: { "${params.outdir}/GenomeBinning/MaxBin2/discarded" }, + mode: params.publish_dir_mode, + pattern: '*.tooshort.gz' + ], + ] + ext.prefix = { "${meta.assembler}-MaxBin2-${meta.id}" } + // if no gene found, will crash so allow ignore so rest of pipeline + // completes but without MaxBin2 results + } + + withName: ADJUST_MAXBIN2_EXT { + publishDir = [ + [ + path: { "${params.outdir}/GenomeBinning/MaxBin2/bins/" }, + mode: params.publish_dir_mode, + pattern: '*.fa.gz' + ], + ] + } + + withName: SPLIT_FASTA { + publishDir = [ + [ + path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned" }, + mode: params.publish_dir_mode, + pattern: '*.*[0-9].fa.gz' + ], + [ + path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned/discarded" }, + mode: params.publish_dir_mode, + pattern: '*.pooled.fa.gz' + ], + [ + path: { "${params.outdir}/GenomeBinning/${meta.binner}/unbinned/discarded" }, + mode: params.publish_dir_mode, + pattern: '*.remaining.fa.gz' + ] + ] + } + + withName: DASTOOL_FASTATOCONTIG2BIN_METABAT2 { + ext.prefix = { "${meta.assembler}-MetaBAT2-${meta.id}" } + } + + withName: DASTOOL_FASTATOCONTIG2BIN_MAXBIN2 { + ext.prefix = { "${meta.assembler}-MaxBin2-${meta.id}" } + } + + withName: DASTOOL_DASTOOL { + publishDir = [ + [ + path: { "${params.outdir}/GenomeBinning/DASTool" }, + mode: params.publish_dir_mode, + pattern: '*.{tsv,log,eval,seqlength}' + ], + ] + ext.prefix = { "${meta.assembler}-DASTool-${meta.id}" } + ext.args = "--write_bins --write_unbinned --write_bin_evals --score_threshold ${params.refine_bins_dastool_threshold}" + } + + withName: RENAME_POSTDASTOOL { + publishDir = [ + [ + path: { "${params.outdir}/GenomeBinning/DASTool/unbinned" }, + mode: params.publish_dir_mode, + pattern: '*-DASToolUnbinned-*.fa' + ], + [ + path: { "${params.outdir}/GenomeBinning/DASTool/bins" }, + mode: params.publish_dir_mode, + // pattern needs to be updated in case of new binning methods + pattern: '*-{MetaBAT2,MaxBin2}Refined-*.fa' + ] + ] + } + + withName: CUSTOM_DUMPSOFTWAREVERSIONS { + publishDir = [ + path: { "${params.outdir}/pipeline_info" }, + mode: params.publish_dir_mode, + pattern: '*_versions.yml' + ] + } + } diff --git a/conf/test.config b/conf/test.config index 2de20845..5df32bdb 100644 --- a/conf/test.config +++ b/conf/test.config @@ -1,11 +1,11 @@ /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Nextflow config file for running minimal tests -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines input files and everything required to run a fast and simple pipeline test. Use as follows: - nextflow run nf-core/mag -profile test, + nextflow run nf-core/mag -profile test, --outdir ---------------------------------------------------------------------------------------- */ @@ -16,16 +16,16 @@ params { // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = 6.GB - max_time = 6.h + max_memory = '6.GB' + max_time = '6.h' // Input data - input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv' - centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz" - kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz" - skip_krona = true - min_length_unbinned_contigs = 1 - max_unbinned_contigs = 2 - busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz" - gtdb = false + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv' + centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz" + kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz" + skip_krona = true + min_length_unbinned_contigs = 1 + max_unbinned_contigs = 2 + busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz" + gtdb = false } diff --git a/conf/test_adapterremoval.config b/conf/test_adapterremoval.config new file mode 100644 index 00000000..45836fbc --- /dev/null +++ b/conf/test_adapterremoval.config @@ -0,0 +1,32 @@ +/* +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/mag -profile test, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Test profile for running with AdapterRemoval' + config_profile_description = 'Minimal test dataset to check pipeline function with AdapterRemoval data' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv' + centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz" + kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz" + skip_krona = true + min_length_unbinned_contigs = 1 + max_unbinned_contigs = 2 + busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz" + gtdb = false + clip_tool = 'adapterremoval' +} diff --git a/conf/test_ancient_dna.config b/conf/test_ancient_dna.config new file mode 100644 index 00000000..33b6f4f8 --- /dev/null +++ b/conf/test_ancient_dna.config @@ -0,0 +1,38 @@ +/* +======================================================================================== + Nextflow config file for running minimal tests +======================================================================================== + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/mag -profile test_ancient_dna, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Ancient DNA test profile ' + config_profile_description = 'Minimal test dataset to check pipeline function for ancient DNA step' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv' + centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz" + kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz" + skip_krona = true + min_length_unbinned_contigs = 1 + max_unbinned_contigs = 2 + busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz" + gtdb = false + ancient_dna = true + binning_map_mode = 'own' + skip_spades = false + skip_spadeshybrid = true + bcftools_view_variant_quality = 0 + refine_bins_dastool = true + refine_bins_dastool_threshold = 0 +} diff --git a/conf/test_binrefinement.config b/conf/test_binrefinement.config new file mode 100644 index 00000000..ddf44ceb --- /dev/null +++ b/conf/test_binrefinement.config @@ -0,0 +1,34 @@ +/* +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Nextflow config file for running minimal tests +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + Defines input files and everything required to run a fast and simple pipeline test. + + Use as follows: + nextflow run nf-core/mag -profile test_binrefinement, --outdir + +---------------------------------------------------------------------------------------- +*/ + +params { + config_profile_name = 'Test profile' + config_profile_description = 'Minimal test dataset to check pipeline function' + + // Limit resources so that this can run on GitHub Actions + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv' + centrifuge_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_cf.tar.gz" + kraken2_db = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/test_data/minigut_kraken.tgz" + skip_krona = true + min_length_unbinned_contigs = 1 + max_unbinned_contigs = 2 + busco_reference = "https://busco-data.ezlab.org/v5/data/lineages/bacteria_odb10.2020-03-06.tar.gz" + gtdb = false + refine_bins_dastool = true + refine_bins_dastool_threshold = 0 + postbinning_input = 'both' +} diff --git a/conf/test_busco_auto.config b/conf/test_busco_auto.config index 0431d6a2..adf3d277 100644 --- a/conf/test_busco_auto.config +++ b/conf/test_busco_auto.config @@ -5,7 +5,7 @@ Defines input files and everything required to run a fast and simple pipeline test. Use as follows: - nextflow run nf-core/mag -profile test_busco_auto, + nextflow run nf-core/mag -profile test_busco_auto, --outdir ---------------------------------------------------------------------------------------- */ @@ -16,8 +16,8 @@ params { // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = 6.GB - max_time = 48.h + max_memory = '6.GB' + max_time = '6.h' // Input data input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.csv' diff --git a/conf/test_full.config b/conf/test_full.config index 629c9c0f..34e81f1a 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -1,11 +1,11 @@ /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Nextflow config file for running full-size tests -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Defines input files and everything required to run a full size pipeline test. Use as follows: - nextflow run nf-core/mag -profile test_full, + nextflow run nf-core/mag -profile test_full, --outdir ---------------------------------------------------------------------------------------- */ diff --git a/conf/test_host_rm.config b/conf/test_host_rm.config index 324ed4de..c93317db 100644 --- a/conf/test_host_rm.config +++ b/conf/test_host_rm.config @@ -5,7 +5,7 @@ Defines input files and everything required to run a fast and simple pipeline test. Use as follows: - nextflow run nf-core/mag -profile test_host_rm, + nextflow run nf-core/mag -profile test_host_rm, --outdir ---------------------------------------------------------------------------------------- */ @@ -16,8 +16,8 @@ params { // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = 6.GB - max_time = 48.h + max_memory = '6.GB' + max_time = '6.h' // Input data host_fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/host_reference/genome.hg38.chr21_10000bp_region.fa" diff --git a/conf/test_hybrid.config b/conf/test_hybrid.config index 4572b605..873d2c5c 100644 --- a/conf/test_hybrid.config +++ b/conf/test_hybrid.config @@ -5,7 +5,7 @@ Defines input files and everything required to run a fast and simple pipeline test. Use as follows: - nextflow run nf-core/mag -profile test_hybrid, + nextflow run nf-core/mag -profile test_hybrid, --outdir ---------------------------------------------------------------------------------------- */ @@ -16,8 +16,8 @@ params { // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = 6.GB - max_time = 48.h + max_memory = '6.GB' + max_time = '6.h' // Input data input = 'https://raw.githubusercontent.com/nf-core/test-datasets/mag/samplesheets/samplesheet.hybrid.csv' diff --git a/conf/test_hybrid_host_rm.config b/conf/test_hybrid_host_rm.config index 16e01860..29db0f7a 100644 --- a/conf/test_hybrid_host_rm.config +++ b/conf/test_hybrid_host_rm.config @@ -5,7 +5,7 @@ Defines input files and everything required to run a fast and simple pipeline test. Use as follows: - nextflow run nf-core/mag -profile test_hybrid_host_rm, + nextflow run nf-core/mag -profile test_hybrid_host_rm, --outdir ---------------------------------------------------------------------------------------- */ @@ -16,8 +16,8 @@ params { // Limit resources so that this can run on GitHub Actions max_cpus = 2 - max_memory = 6.GB - max_time = 48.h + max_memory = '6.GB' + max_time = '6.h' // Input data host_fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/mag/host_reference/genome.hg38.chr21_10000bp_region.fa" diff --git a/docs/README.md b/docs/README.md index 7b0a1c80..b1cbb97f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,9 +2,9 @@ The nf-core/mag documentation is split into the following pages: -* [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. -* [Output](output.md) - * An overview of the different results produced by the pipeline and how to interpret them. +- [Usage](usage.md) + - An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. +- [Output](output.md) + - An overview of the different results produced by the pipeline and how to interpret them. You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/images/mag_workflow.png b/docs/images/mag_workflow.png index 69c93297..6d38b99e 100644 Binary files a/docs/images/mag_workflow.png and b/docs/images/mag_workflow.png differ diff --git a/docs/images/mag_workflow.svg b/docs/images/mag_workflow.svg index 7f78e0fc..22cd66cc 100644 --- a/docs/images/mag_workflow.svg +++ b/docs/images/mag_workflow.svg @@ -1,13 +1,13 @@ + inkscape:snap-page="true" + inkscape:showpageshadow="2" + inkscape:deskcolor="#d1d1d1"> + originx="26.458333" + originy="145.52082" + dotted="true" /> @@ -460,19 +463,157 @@ inkscape:label="Layer 1" inkscape:groupmode="layer" id="layer1" - transform="translate(21.371482,-30.735583)"> + transform="translate(26.458364,-26.452039)"> + width="320.14584" + height="150.8125" + x="-33.414696" + y="36.337353" /> + + + + + Taxonomicclassification + + + + + Centrifuge + + + + Kraken2 + + + + Visualization + + + Krona + + + + + + inkscape:export-ydpi="289.40701" + sodipodi:nodetypes="ccccc" /> @@ -532,7 +674,7 @@ x="258.7608" sodipodi:role="line">Reporting MAG summary + sodipodi:role="line">(MAG summary) - + d="m 72.418637,113.06652 -0.0554,-18.523695 0.319985,1.325776 C 75.219173,106.37566 76.96753,108.9136 80.83652,112.1513 l 2.165452,0.91522 -2.120862,0.8558 c -3.507127,1.86657 -7.123192,10.89328 -7.824223,13.72939 -0.22865,0.96955 -0.47831,2.02243 -0.55479,2.33974 -0.11954,0.496 -0.03714,-1.43797 -0.08346,-16.92493 z" + style="fill:url(#linearGradient5670-9-6-3-2-02);fill-opacity:1;stroke:#000000;stroke-width:0.0517527;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" + sodipodi:nodetypes="scsscccss" /> + d="m 37.016355,133.94985 -32.159458,4e-5 2.117303,0.20537 c 18.904981,1.83348 23.539836,3.32083 29.320361,6.59094 l 1.481939,0.83835 1.679944,-0.80064 c 3.15648,-1.50434 4.94046,-2.72016 7.704475,-3.49467 3.583223,-1.00405 8.808608,-1.85994 16.80759,-2.75304 1.73097,-0.19326 3.61073,-0.40426 4.17727,-0.46891 0.88554,-0.10097 -3.47938,-0.11749 -31.129424,-0.11749 z" + style="fill:url(#linearGradient5670-9-173);fill-opacity:1;stroke:#000000;stroke-width:0.0482295;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" + sodipodi:nodetypes="cccscsssccc" /> @@ -719,42 +856,45 @@ inkscape:export-ydpi="289.40701" inkscape:export-xdpi="289.40701" id="text4732-2-51-2-5" - y="93.097748" - x="-18.763838" + y="95.489738" + x="-0.21160807" style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:3.52778px;line-height:0.25;font-family:'Maven Pro';-inkscape-font-specification:'Maven Pro Bold';letter-spacing:0px;word-spacing:0px;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583" xml:space="preserve">Short reads(required) Adapter/qualitytrimming + fastp + AdapterRemoval Host read removal + style="fill:#24af63;fill-opacity:1;stroke:#000000;stroke-width:0.3;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" /> + style="fill:#24af63;fill-opacity:1;stroke:#000000;stroke-width:0.3;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" /> - - - - - Taxonomicclassification - - - - - Centrifuge - - - - - Kraken2 - - - - - Visualization - - - Krona - - - - @@ -1175,55 +1197,56 @@ inkscape:export-ydpi="289.40701" inkscape:export-xdpi="289.40701" ry="4.5584702" - y="42.587685" - x="-34.27684" - height="73.163437" - width="39.596622" + y="43.085247" + x="-35.226036" + height="74.083328" + width="39.6875" id="rect4728-66" style="fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.489677;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" transform="rotate(-90)" /> Long readsLong reads(optional) + x="39.050476" + y="32.835724" + style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-family:'Maven Pro';-inkscape-font-specification:'Maven Pro Bold';stroke-width:0.254709" /> + id="g932" + transform="translate(0,-2.6458334)"> + style="fill:#24af63;fill-opacity:1;stroke:#000000;stroke-width:0.3;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" /> Evaluation + transform="translate(-35.256451,22.522336)"> + style="fill:#24af63;fill-opacity:1;stroke:#000000;stroke-width:0.3;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" /> Remove Lambda + transform="translate(0,-5.7595465)"> Quality filtering + transform="translate(-139.92402,148.73448)"> Taxonomic classification Genome annotation + transform="translate(51.346208,2.0158954)"> Protein-codinggene prediction Assembly(sample- or group-wise) + transform="translate(-126.89976,-0.60967038)"> Evaluation QUAST + aDNA Validation + + pyDamage + + Freebayes + + BCFTools + transform="translate(-100.18223,-3.7041668)"> @@ -1872,7 +1965,7 @@ ry="2.4730365" /> + transform="translate(55.609418,-40.107415)"> + inkscape:export-ydpi="289.40701" + sodipodi:nodetypes="ccsscccssc" /> - + d="m 210.34691,117.49101 6e-5,-8.46414 0.19864,0.55722 c 1.77374,4.97566 3.75896,6.19556 6.92234,7.71694 l 0.81099,0.39004 -0.77453,0.44216 c -3.16806,2.12084 -5.64746,4.27599 -6.59029,6.45142 -0.18696,0.45558 -0.39108,0.95033 -0.45362,1.09944 -0.0976,0.23307 -0.11364,-0.91576 -0.11359,-8.19308 z" + style="fill:url(#linearGradient1315);fill-opacity:1;stroke:#000000;stroke-width:0.052239;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" + sodipodi:nodetypes="scssccccs" /> @@ -2144,12 +2232,12 @@ inkscape:export-ydpi="289.40701" inkscape:export-xdpi="289.40701" ry="2.9414928" - y="-247.66061" - x="-28.27034" - height="47.210953" - width="63.506157" + y="-246.46477" + x="-28.209223" + height="44.979145" + width="74.083328" id="rect4728-6-3" - style="fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.48265;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" + style="fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.483;stroke-linecap:square;stroke-linejoin:bevel;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1;paint-order:normal" transform="rotate(90)" /> + MetaBAT2 + MaxBin2 + transform="translate(110.93518,-163.87148)"> + d="m -98.802892,216.93853 a 3.6926136,3.5400603 0 0 1 -0.103524,0.48184" /> + transform="translate(16.529063,-26.683331)"> Evaluation BUSCO Abundance estimation(Abundance estimationand visualization + id="tspan1890">and visualization) + v2.2.0 + + + Binning refinement + + DAS Tool CC-BY 4.0 Design originally by Zandra Fagernäs + x="220.64191" + y="46.223618">CC-BY 4.0 Design originally by Zandra Fagernäs + diff --git a/docs/images/mqc_fastqc_adapter.png b/docs/images/mqc_fastqc_adapter.png deleted file mode 100755 index 361d0e47..00000000 Binary files a/docs/images/mqc_fastqc_adapter.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_counts.png b/docs/images/mqc_fastqc_counts.png deleted file mode 100755 index cb39ebb8..00000000 Binary files a/docs/images/mqc_fastqc_counts.png and /dev/null differ diff --git a/docs/images/mqc_fastqc_quality.png b/docs/images/mqc_fastqc_quality.png deleted file mode 100755 index a4b89bf5..00000000 Binary files a/docs/images/mqc_fastqc_quality.png and /dev/null differ diff --git a/docs/images/nf-core-mag_logo.png b/docs/images/nf-core-mag_logo.png deleted file mode 100644 index 628fde96..00000000 Binary files a/docs/images/nf-core-mag_logo.png and /dev/null differ diff --git a/docs/images/nf-core-mag_logo_dark.png b/docs/images/nf-core-mag_logo_dark.png new file mode 100644 index 00000000..f20e69e5 Binary files /dev/null and b/docs/images/nf-core-mag_logo_dark.png differ diff --git a/docs/images/nf-core-mag_logo_light.png b/docs/images/nf-core-mag_logo_light.png new file mode 100644 index 00000000..64276cbe Binary files /dev/null and b/docs/images/nf-core-mag_logo_light.png differ diff --git a/docs/output.md b/docs/output.md index 644efb1d..dec6c0d6 100644 --- a/docs/output.md +++ b/docs/output.md @@ -10,16 +10,17 @@ The directories listed below will be created in the results directory after the The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -* [Quality control](#quality-control) of input reads - trimming and contaminant removal -* [Taxonomic classification of trimmed reads](#taxonomic-classification-of-trimmed-reads) -* [Assembly](#assembly) of trimmed reads -* [Protein-coding gene prediction](#gene-prediction) of assemblies -* [Binning](#binning) of assembled contigs -* [Taxonomic classification of binned genomes](#taxonomic-classification-of-binned-genomes) -* [Genome annotation of binned genomes](#genome-annotation-of-binned-genomes) -* [Additional summary for binned genomes](#additional-summary-for-binned-genomes) -* [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline -* [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +- [Quality control](#quality-control) of input reads - trimming and contaminant removal +- [Taxonomic classification of trimmed reads](#taxonomic-classification-of-trimmed-reads) +- [Assembly](#assembly) of trimmed reads +- [Protein-coding gene prediction](#gene-prediction) of assemblies +- [Binning and binning refinement](#binning-and-binning-refinement) of assembled contigs +- [Taxonomic classification of binned genomes](#taxonomic-classification-of-binned-genomes) +- [Genome annotation of binned genomes](#genome-annotation-of-binned-genomes) +- [Additional summary for binned genomes](#additional-summary-for-binned-genomes) +- [Ancient DNA](#ancient-dna) +- [MultiQC](#multiqc) - aggregate report, describing results of the whole pipeline +- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution Note that when specifying the parameter `--coassemble_group`, for the corresponding output filenames/directories of the assembly or downsteam processes the group ID, or more precisely the term `group-[group_id]`, will be used instead of the sample ID. @@ -36,9 +37,9 @@ FastQC is run for visualising the general quality metrics of the sequencing runs
Output files -* `QC_shortreads/fastqc/` - * `[sample]_[1/2]_fastqc.html`: FastQC report, containing quality metrics for your untrimmed raw fastq files - * `[sample].trimmed_[1/2]_fastqc.html`: FastQC report, containing quality metrics for trimmed and, if specified, filtered read files +- `QC_shortreads/fastqc/` + - `[sample]_[1/2]_fastqc.html`: FastQC report, containing quality metrics for your untrimmed raw fastq files + - `[sample].trimmed_[1/2]_fastqc.html`: FastQC report, containing quality metrics for trimmed and, if specified, filtered read files
@@ -51,9 +52,21 @@ FastQC is run for visualising the general quality metrics of the sequencing runs
Output files -* `QC_shortreads/fastp/[sample]/` - * `fastp.html`: Interactive report - * `fastp.json`: Report in json format +- `QC_shortreads/fastp/[sample]/` + - `fastp.html`: Interactive report + - `fastp.json`: Report in json format + +
+ +### AdapterRemoval2 + +[AdapterRemoval](https://adapterremoval.readthedocs.io/en/stable/) searches for and removes remnant adapter sequences from High-Throughput Sequencing (HTS) data and (optionally) trims low quality bases from the 3' end of reads following adapter removal. It is popular in the field of palaeogenomics. The output logs are stored in the results folder, and as a part of the MultiQC report. + +
+Output files + +- `QC_shortreads/adapterremoval/[sample]/` + - `[sample]_ar2.log`: AdapterRemoval log file (normally called `.settings` by AdapterRemoval.)
@@ -64,8 +77,8 @@ The pipeline uses bowtie2 to map the reads against PhiX and removes mapped reads
Output files -* `QC_shortreads/remove_phix/` - * `[sample].phix_removed.bowtie2.log`: Contains a brief log file indicating how many reads have been retained. +- `QC_shortreads/remove_phix/` + - `[sample].phix_removed.bowtie2.log`: Contains a brief log file indicating how many reads have been retained.
@@ -76,8 +89,9 @@ The pipeline uses bowtie2 to map short reads against the host reference genome s
Output files -* `QC_shortreads/remove_host/` - * `[sample].host_removed.bowtie2.log`: Contains the bowtie2 log file indicating how many reads have been mapped as well as a file listing the read ids of discarded reads. +- `QC_shortreads/remove_host/` + - `[sample].host_removed.bowtie2.log`: Contains the bowtie2 log file indicating how many reads have been mapped. + - `[sample].host_removed.mapped*.read_ids.txt`: Contains a file listing the read ids of discarded reads.
@@ -88,8 +102,8 @@ The pipeline uses Nanolyse to map the reads against the Lambda phage and removes
Output files -* `QC_longreads/NanoLyse/` - * `[sample]_nanolyse.log`: Contains a brief log file indicating how many reads have been retained. +- `QC_longreads/NanoLyse/` + - `[sample]_nanolyse.log`: Contains a brief log file indicating how many reads have been retained.
@@ -109,9 +123,9 @@ NanoPlot is used to calculate various metrics and plots about the quality and le
Output files -* `QC_longreads/NanoPlot/[sample]/` - * `raw_*.[png/html/txt]`: Plots and reports for raw data - * `filtered_*.[png/html/txt]`: Plots and reports for filtered data +- `QC_longreads/NanoPlot/[sample]/` + - `raw_*.[png/html/txt]`: Plots and reports for raw data + - `filtered_*.[png/html/txt]`: Plots and reports for filtered data
@@ -124,9 +138,9 @@ Kraken2 classifies reads using a k-mer based approach as well as assigns taxonom
Output files -* `Taxonomy/kraken2/[sample]/` - * `kraken2.report`: Classification in the Kraken report format. See the [kraken2 manual](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats) for more details - * `taxonomy.krona.html`: Interactive pie chart produced by [KronaTools](https://github.com/marbl/Krona/wiki) +- `Taxonomy/kraken2/[sample]/` + - `kraken2.report`: Classification in the Kraken report format. See the [kraken2 manual](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats) for more details + - `taxonomy.krona.html`: Interactive pie chart produced by [KronaTools](https://github.com/marbl/Krona/wiki)
@@ -139,10 +153,10 @@ More information on the [Centrifuge](https://ccb.jhu.edu/software/centrifuge/) w
Output files -* `Taxonomy/centrifuge/[sample]/` - * `report.txt`: Tab-delimited result file. See the [centrifuge manual](https://ccb.jhu.edu/software/centrifuge/manual.shtml#centrifuge-classification-output) for information about the fields - * `kreport.txt`: Classification in the Kraken report format. See the [kraken2 manual](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats) for more details - * `taxonomy.krona.html`: Interactive pie chart produced by [KronaTools](https://github.com/marbl/Krona/wiki) +- `Taxonomy/centrifuge/[sample]/` + - `report.txt`: Tab-delimited result file. See the [centrifuge manual](https://ccb.jhu.edu/software/centrifuge/manual.shtml#centrifuge-classification-output) for information about the fields + - `kreport.txt`: Classification in the Kraken report format. See the [kraken2 manual](https://github.com/DerrickWood/kraken2/wiki/Manual#output-formats) for more details + - `taxonomy.krona.html`: Interactive pie chart produced by [KronaTools](https://github.com/marbl/Krona/wiki)
@@ -157,12 +171,12 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl
Output files -* `Assembly/MEGAHIT/` - * `[sample/group].contigs.fa.gz`: Compressed metagenome assembly in fasta format - * `[sample/group].log`: Log file - * `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs - * `MEGAHIT-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. - * `MEGAHIT-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap"). +- `Assembly/MEGAHIT/` + - `[sample/group].contigs.fa.gz`: Compressed metagenome assembly in fasta format + - `[sample/group].log`: Log file + - `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs + - `MEGAHIT-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. + - `MEGAHIT-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
@@ -173,14 +187,14 @@ Trimmed (short) reads are assembled with both megahit and SPAdes. Hybrid assembl
Output files -* `Assembly/SPAdes/` - * `[sample/group]_scaffolds.fasta.gz`: Compressed assembled scaffolds in fasta format - * `[sample/group]_graph.gfa.gz`: Compressed assembly graph in gfa format - * `[sample/group]_contigs.fasta.gz`: Compressed assembled contigs in fasta format - * `[sample/group].log`: Log file - * `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs - * `SPAdes-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. - * `SPAdes-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap"). +- `Assembly/SPAdes/` + - `[sample/group]_scaffolds.fasta.gz`: Compressed assembled scaffolds in fasta format + - `[sample/group]_graph.gfa.gz`: Compressed assembly graph in gfa format + - `[sample/group]_contigs.fasta.gz`: Compressed assembled contigs in fasta format + - `[sample/group].log`: Log file + - `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs + - `SPAdes-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. + - `SPAdes-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
@@ -191,14 +205,14 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft
Output files -* `Assembly/SPAdesHybrid/` - * `[sample/group]_scaffolds.fasta.gz`: Compressed assembled scaffolds in fasta format - * `[sample/group]_graph.gfa.gz`: Compressed assembly graph in gfa format - * `[sample/group]_contigs.fasta.gz`: Compressed assembled contigs in fasta format - * `[sample/group].log`: Log file - * `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs - * `SPAdesHybrid-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. - * `SPAdesHybrid-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap"). +- `Assembly/SPAdesHybrid/` + - `[sample/group]_scaffolds.fasta.gz`: Compressed assembled scaffolds in fasta format + - `[sample/group]_graph.gfa.gz`: Compressed assembly graph in gfa format + - `[sample/group]_contigs.fasta.gz`: Compressed assembled contigs in fasta format + - `[sample/group].log`: Log file + - `QC/[sample/group]/`: Directory containing QUAST files and Bowtie2 mapping logs + - `SPAdesHybrid-[sample].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the sample that the metagenome was assembled from, only present if `--coassemble_group` is not set. + - `SPAdesHybrid-[sample/group]-[sampleToMap].bowtie2.log`: Bowtie2 log file indicating how many reads have been mapped from the respective sample ("sampleToMap").
@@ -209,10 +223,19 @@ SPAdesHybrid is a part of the [SPAdes](http://cab.spbu.ru/software/spades/) soft
Output files -* `Assembly/[assembler]/QC/[sample/group]/` - * `report.*`: QUAST report in various formats, such as html, txt, tsv or tex - * `quast.log`: QUAST log file - * `predicted_genes/[assembler]-[sample/group].rna.gff`: Contig positions for rRNA genes in gff version 3 format +- `Assembly/[assembler]/QC/[sample/group]/QUAST/` + - `report.*`: QUAST report in various formats, such as html, pdf, tex, tsv, or txt + - `transposed_report.*`: QUAST report that has been transposed into wide format (tex, tsv, or txt) + - `quast.log`: QUAST log file + - `metaquast.log`: MetaQUAST log file + - `icarus.html`: Icarus main menu with links to interactive viewers + - `icarus_viewers/contig_size_viewer.html`: Diagram of contigs that are ordered from longest to shortest + - `basic_stats/cumulative_plot.pdf`: Shows the growth of contig lengths (contigs are ordered from largest to shortest) + - `basic_stats/GC_content_plot.pdf`: Shows the distribution of GC content in the contigs + - `basic_stats/[assembler]-[sample/group]_GC_content_plot.pdf`: Histogram of the GC percentage for the contigs + - `basic_stats/Nx_plot.pdf`: Plot of Nx values as x varies from 0 to 100%. + - `predicted_genes/[assembler]-[sample/group].rna.gff`: Contig positions for rRNA genes in gff version 3 format + - `predicted_genes/barrnap.log`: Barrnap log file (ribosomal RNA predictor)
@@ -223,25 +246,27 @@ Protein-coding genes are predicted for each assembly.
Output files -* `Prodigal/` - * `[sample/group].gff`: Gene Coordinates in GFF format - * `[sample/group].faa`: The protein translation file consists of all the proteins from all the sequences in multiple FASTA format. - * `[sample/group].fna`: Nucleotide sequences of the predicted proteins using the DNA alphabet, not mRNA (so you will see 'T' in the output and not 'U'). - * `[sample/group]_all.txt`: Information about start positions of genes. +- `Prodigal/` + - `[sample/group].gff`: Gene Coordinates in GFF format + - `[sample/group].faa`: The protein translation file consists of all the proteins from all the sequences in multiple FASTA format. + - `[sample/group].fna`: Nucleotide sequences of the predicted proteins using the DNA alphabet, not mRNA (so you will see 'T' in the output and not 'U'). + - `[sample/group]_all.txt`: Information about start positions of genes.
-## Binning +## Binning and binning refinement ### Contig sequencing depth -Sequencing depth per contig and sample is generated by `jgi_summarize_bam_contig_depths --outputDepth`. The values correspond to `(sum of exactely aligned bases) / ((contig length)-2*75)`. For example, for two reads aligned exactly with `10` and `9` bases on a 1000 bp long contig the depth is calculated by `(10+9)/(1000-2*75)` (1000bp length of contig minus 75bp from each end, which is excluded). +Sequencing depth per contig and sample is generated by MetaBAT2's `jgi_summarize_bam_contig_depths --outputDepth`. The values correspond to `(sum of exactly aligned bases) / ((contig length)-2*75)`. For example, for two reads aligned exactly with `10` and `9` bases on a 1000 bp long contig the depth is calculated by `(10+9)/(1000-2*75)` (1000bp length of contig minus 75bp from each end, which is excluded). + +These depth files are used for downstream binning steps.
Output files -* `GenomeBinning/` - * `[assembler]-[sample/group]-depth.txt.gz`: Sequencing depth for each contig and sample or group, only for short reads. +- `GenomeBinning/depths/contigs/` + - `[assembler]-[sample/group]-depth.txt.gz`: Sequencing depth for each contig and sample or group, only for short reads.
@@ -252,22 +277,23 @@ Sequencing depth per contig and sample is generated by `jgi_summarize_bam_contig
Output files -* `GenomeBinning/MetaBAT2/` - * `[assembler]-[sample/group].*.fa`: Genome bins retrieved from input assembly - * `[assembler]-[sample/group].unbinned.*.fa`: Contigs that were not binned with other contigs but considered interesting. By default, these are at least 1 Mbp (`--min_length_unbinned_contigs`) in length and at most the 100 longest contigs (`--max_unbinned_contigs`) are reported +- `GenomeBinning/MetaBAT2/` + - `bins/[assembler]-[binner]-[sample/group].*.fa.gz`: Genome bins retrieved from input assembly + - `unbinned/[assembler]-[binner]-[sample/group].unbinned.[1-9]*.fa.gz`: Contigs that were not binned with other contigs but considered interesting. By default, these are at least 1 Mbp (`--min_length_unbinned_contigs`) in length and at most the 100 longest contigs (`--max_unbinned_contigs`) are reported
-All the files and contigs in this folder will be assessed by QUAST and BUSCO. +All the files and contigs in these folders will be assessed by QUAST and BUSCO.
Output files -* `GenomeBinning/MetaBAT2/discarded/` - * `*.lowDepth.fa.gz`: Low depth contigs that are filtered by MetaBat2 - * `*.tooShort.fa.gz`: Too short contigs that are filtered by MetaBat2 - * `*.unbinned.pooled.fa.gz`: Pooled unbinned contigs equal or above `--min_contig_size`, by default 1500 bp. - * `*.unbinned.remaining.fa.gz`: Remaining unbinned contigs below `--min_contig_size`, by default 1500 bp, but not in any other file. +- `GenomeBinning/MetaBAT2/discarded/` + - `*.lowDepth.fa.gz`: Low depth contigs that are filtered by MetaBAT2 + - `*.tooShort.fa.gz`: Too short contigs that are filtered by MetaBAT2 +- `GenomeBinning/MetaBAT2/unbinned/discarded/` + - `*.unbinned.pooled.fa.gz`: Pooled unbinned contigs equal or above `--min_contig_size`, by default 1500 bp. + - `*.unbinned.remaining.fa.gz`: Remaining unbinned contigs below `--min_contig_size`, by default 1500 bp, but not in any other file.
@@ -275,50 +301,114 @@ All the files in this folder contain small and/or unbinned contigs that are not Files in these two folders contain all contigs of an assembly. +### MaxBin2 + +[MaxBin2](https://sourceforge.net/projects/maxbin2/) recovers genome bins (that is, contigs/scaffolds that all belongs to a same organism) from metagenome assemblies. + +
+Output files + +- `GenomeBinning/MaxBin2/` + - `bins/[assembler]-[binner]-[sample/group].*.fa.gz`: Genome bins retrieved from input assembly + - `unbinned/[assembler]-[binner]-[sample/group].noclass.[1-9]*.fa.gz`: Contigs that were not binned with other contigs but considered interesting. By default, these are at least 1 Mbp (`--min_length_unbinned_contigs`) in length and at most the 100 longest contigs (`--max_unbinned_contigs`) are reported. + +
+ +All the files and contigs in these folders will be assessed by QUAST and BUSCO. + +
+Output files + +- `GenomeBinning/MaxBin2/discarded/` + - `*.tooshort.gz`: Too short contigs that are filtered by MaxBin2 +- `GenomeBinning/MaxBin2/unbinned/discarded/` + - `*.noclass.pooled.fa.gz`: Pooled unbinned contigs equal or above `--min_contig_size`, by default 1500 bp. + - `*.noclass.remaining.fa.gz`: Remaining unbinned contigs below `--min_contig_size`, by default 1500 bp, but not in any other file. + +
+ +All the files in this folder contain small and/or unbinned contigs that are not further processed. + +Files in these two folders contain all contigs of an assembly. + +### DAS Tool + +[DAS Tool](https://github.com/cmks/DAS_Tool) is an automated binning refinement method that integrates the results of a flexible number of binning algorithms to calculate an optimized, non-redundant set of bins from a single assembly. nf-core/mag uses this tool to attempt to further improve bins based on combining the MetaBAT2 and MaxBin2 binning output, assuming sufficient quality is met for those bins. + +DAS Tool will remove contigs from bins that do not pass additional filtering criteria, and will discard redundant lower-quality output from binners that represent the same estimated 'organism', until the single highest quality bin is represented. + +
+Output files + +- `GenomeBinning/DASTool/` + - `[assembler]-[sample/group]_allBins.eval`: Tab-delimited description with quality and completeness metrics for the input bin sets. Quality and completeness are estimated by DAS TOOL using a scoring function based on the frequency of bacterial or archaeal reference single-copy genes (SCG). Please see note at the bottom of this section on file names. + - `[assembler]-[sample/group]_DASTool_summary.tsv`: Tab-delimited description with quality and completeness metrics for the refined output bin sets. + - `[assembler]-[sample/group]_DASTool_contig2bin.tsv`: File describing which contig is associated to which bin from the input binners. + - `[assembler]-[sample/group]_DASTool.log`: Log file from the DAS Tool run describing the command executed and additional runtime information. + - `[assembler]-[sample/group].seqlength`: Tab-delimited file describing the length of each contig. + - `bins/[assembler]-[binner]Refined-[sample/group].*.fa`: Refined bins in fasta format. + - `unbinned/[assembler]-DASToolUnbinned-[sample/group].*.fa`: Unbinned contigs from bin refinement in fasta format. + +
+ +By default, only the raw bins (and unbinned contigs) from the actual binning methods, but not from the binning refinement with DAS Tool, will be used for downstream bin quality control, annotation and taxonomic classification. The parameter `--postbinning_input` can be used to change this behaviour. + +⚠️ Due to ability to perform downstream QC of both raw and refined bins in parallel (via `--postbinning_input)`, bin names in DAS Tools's `*_allBins.eval` file will include `Refined`. However for this particular file, they _actually_ refer to the 'raw' input bins. The pipeline renames the input files prior to running DASTool to ensure they can be disambuguated from the original bin files in the downstream QC steps. + ### Bin sequencing depth -For each genome bin the median sequencing depth is computed based on the corresponding contig depths given in `GenomeBinning/[assembler]-[sample/group]-depth.txt.gz`. +For each bin or refined bin the median sequencing depth is computed based on the corresponding contig depths.
Output files -* `GenomeBinning/` - * `bin_depths_summary.tsv`: Summary of bin sequencing depths for all samples. Depths are available for samples mapped against the corresponding assembly, i.e. according to the mapping strategy specified with `--binning_map_mode`. Only for short reads. - * `[assembler]-[sample/group]-binDepths.heatmap.png`: Clustered heatmap showing bin abundances of the assembly across samples. Bin depths are transformed to centered log-ratios and bins as well as samples are clustered by Euclidean distance. Again, sample depths are available according to the mapping strategy specified with `--binning_map_mode`. +- `GenomeBinning/depths/bins/` + - `bin_depths_summary.tsv`: Summary of bin sequencing depths for all samples. Depths are available for samples mapped against the corresponding assembly, i.e. according to the mapping strategy specified with `--binning_map_mode`. Only for short reads. + - `bin_refined_depths_summary.tsv`: Summary of sequencing depths for refined bins for all samples, if refinement was performed. Depths are available for samples mapped against the corresponding assembly, i.e. according to the mapping strategy specified with `--binning_map_mode`. Only for short reads. + - `[assembler]-[binner]-[sample/group]-binDepths.heatmap.png`: Clustered heatmap showing bin abundances of the assembly across samples. Bin depths are transformed to centered log-ratios and bins as well as samples are clustered by Euclidean distance. Again, sample depths are available according to the mapping strategy specified with `--binning_map_mode`.
### QC for metagenome assembled genomes with QUAST -[QUAST](http://cab.spbu.ru/software/quast/) is a tool that evaluates genome assemblies by computing various metrics. The QUAST output is also included in the MultiQC report, as well as in the assembly directories themselves. +[QUAST](http://cab.spbu.ru/software/quast/) is a tool that evaluates genome assemblies by computing various metrics. The QUAST output is in the bin directories shown below. This QUAST output is not shown in the MultiQC report.
Output files -* `GenomeBinning/QC/QUAST/[assembler]-[bin]/` - * `report.*`: QUAST report in various formats, such as html, txt, tsv or tex - * `quast.log`: QUAST log file - * `predicted_genes/[assembler]-[sample/group].rna.gff`: Contig positions for rRNA genes in gff version 3 format -* `GenomeBinning/QC/` - * `quast_summary.tsv`: QUAST output for all bins summarized +- `GenomeBinning/QC/QUAST/[assembler]-[bin]/` + - `report.*`: QUAST report in various formats, such as html, pdf, tex, tsv, or txt + - `transposed_report.*`: QUAST report that has been transposed into wide format (tex, tsv, or txt) + - `quast.log`: QUAST log file + - `metaquast.log`: MetaQUAST log file + - `icarus.html`: Icarus main menu with links to interactive viewers + - `icarus_viewers/contig_size_viewer.html`: Diagram of contigs that are ordered from longest to shortest + - `basic_stats/cumulative_plot.pdf`: Shows the growth of contig lengths (contigs are ordered from largest to shortest) + - `basic_stats/GC_content_plot.pdf`: Shows the distribution of GC content in the contigs + - `basic_stats/[assembler]-[bin]_GC_content_plot.pdf`: Histogram of the GC percentage for the contigs + - `basic_stats/Nx_plot.pdf`: Plot of Nx values as x varies from 0 to 100%. + - `predicted_genes/[assembler]-[bin].rna.gff`: Contig positions for rRNA genes in gff version 3 format + - `predicted_genes/barrnap.log`: Barrnap log file (ribosomal RNA predictor) +- `GenomeBinning/QC/` + - `quast_summary.tsv`: QUAST output for all bins summarized
### QC for metagenome assembled genomes with BUSCO -[BUSCO](https://busco.ezlab.org/) is a tool used to assess the completeness of a genome assembly. It is run on all the genome bins and high quality contigs obtained by MetaBAT2. By default, BUSCO is run in automated lineage selection mode in which it first tries to select the domain and then a more specific lineage based on phylogenetic placement. If available, result files for both the selected domain lineage and the selected more specific lineage are placed in the output directory. If a lineage dataset is specified already with `--busco_reference`, only results for this specific lineage will be generated. +[BUSCO](https://busco.ezlab.org/) is a tool used to assess the completeness of a genome assembly. It is run on all the genome bins and high quality contigs obtained by the applied binning and/or binning refinement methods (depending on the `--postbinning_input` parameter). By default, BUSCO is run in automated lineage selection mode in which it first tries to select the domain and then a more specific lineage based on phylogenetic placement. If available, result files for both the selected domain lineage and the selected more specific lineage are placed in the output directory. If a lineage dataset is specified already with `--busco_reference`, only results for this specific lineage will be generated.
Output files -* `GenomeBinning/QC/BUSCO/` - * `[assembler]-[bin]_busco.log`: Log file containing the standard output of BUSCO. - * `[assembler]-[bin]_busco.err`: File containing potential error messages returned from BUSCO. - * `short_summary.domain.[lineage].[assembler]-[bin].txt`: BUSCO summary of the results for the selected domain when run in automated lineage selection mode. Not available for bins for which a viral lineage was selected. - * `short_summary.specific_lineage.[lineage].[assembler]-[bin].txt`: BUSCO summary of the results in case a more specific lineage than the domain could be selected or for the lineage provided via `--busco_reference`. - * `[assembler]-[bin]_buscos.[lineage].fna.gz`: Nucleotide sequence of all identified BUSCOs for used lineages (domain or specific). - * `[assembler]-[bin]_buscos.[lineage].faa.gz`: Aminoacid sequence of all identified BUSCOs for used lineages (domain or specific). - * `[assembler]-[bin]_prodigal.gff`: Genes predicted with Prodigal. +- `GenomeBinning/QC/BUSCO/` + - `[assembler]-[bin]_busco.log`: Log file containing the standard output of BUSCO. + - `[assembler]-[bin]_busco.err`: File containing potential error messages returned from BUSCO. + - `short_summary.domain.[lineage].[assembler]-[bin].txt`: BUSCO summary of the results for the selected domain when run in automated lineage selection mode. Not available for bins for which a viral lineage was selected. + - `short_summary.specific_lineage.[lineage].[assembler]-[bin].txt`: BUSCO summary of the results in case a more specific lineage than the domain could be selected or for the lineage provided via `--busco_reference`. + - `[assembler]-[bin]_buscos.[lineage].fna.gz`: Nucleotide sequence of all identified BUSCOs for used lineages (domain or specific). + - `[assembler]-[bin]_buscos.[lineage].faa.gz`: Aminoacid sequence of all identified BUSCOs for used lineages (domain or specific). + - `[assembler]-[bin]_prodigal.gff`: Genes predicted with Prodigal.
@@ -327,9 +417,9 @@ If the parameter `--save_busco_reference` is set, additionally the used BUSCO li
Output files -* `GenomeBinning/QC/BUSCO/` - * `busco_downloads/`: All files and lineage datasets downloaded by BUSCO when run in automated lineage selection mode. (Can currently not be used to reproduce analysis, see the [nf-core/mag website documentation](https://nf-co.re/mag/usage#reproducibility) how to achieve reproducible BUSCO results). - * `reference/*.tar.gz`: BUSCO reference lineage dataset that was provided via `--busco_reference`. +- `GenomeBinning/QC/BUSCO/` + - `busco_downloads/`: All files and lineage datasets downloaded by BUSCO when run in automated lineage selection mode. (Can currently not be used to reproduce analysis, see the [nf-core/mag website documentation](https://nf-co.re/mag/usage#reproducibility) how to achieve reproducible BUSCO results). + - `reference/*.tar.gz`: BUSCO reference lineage dataset that was provided via `--busco_reference`.
@@ -338,29 +428,28 @@ Besides the reference files or output files created by BUSCO, the following summ
Output files -* `GenomeBinning/QC/` - * `busco_summary.tsv`: A summary table of the BUSCO results, with % of marker genes found. If run in automated lineage selection mode, both the results for the selected domain and for the selected more specific lineage will be given, if available. +- `GenomeBinning/QC/` + - `busco_summary.tsv`: A summary table of the BUSCO results, with % of marker genes found. If run in automated lineage selection mode, both the results for the selected domain and for the selected more specific lineage will be given, if available.
- ## Taxonomic classification of binned genomes ### CAT -[CAT](https://github.com/dutilh/CAT) is a toolkit for annotating contigs and bins from metagenome-assembled-genomes. The MAG pipeline uses CAT to assign taxonomy to genome bins based on the taxnomy of the contigs. +[CAT](https://github.com/dutilh/CAT) is a toolkit for annotating contigs and bins from metagenome-assembled-genomes. The nf-core/mag pipeline uses CAT to assign taxonomy to genome bins based on the taxnomy of the contigs.
Output files -* `Taxonomy/CAT/[assembler]/` - * `[assembler]-[sample/group].ORF2LCA.names.txt.gz`: Tab-delimited files containing the lineage of each contig, with full lineage names - * `[assembler]-[sample/group].bin2classification.names.txt.gz`: Taxonomy classification of the genome bins, with full lineage names -* `Taxonomy/CAT/[assembler]/raw/` - * `[assembler]-[sample/group].concatenated.predicted_proteins.faa.gz`: Predicted protein sequences for each genome bin, in fasta format - * `[assembler]-[sample/group].concatenated.predicted_proteins.gff.gz`: Predicted protein features for each genome bin, in gff format - * `[assembler]-[sample/group].ORF2LCA.txt.gz`: Tab-delimited files containing the lineage of each contig - * `[assembler]-[sample/group].bin2classification.txt.gz`: Taxonomy classification of the genome bins - * `[assembler]-[sample/group].log`: Log files +- `Taxonomy/CAT/[assembler]/[binner]/` + - `[assembler]-[binner]-[sample/group].ORF2LCA.names.txt.gz`: Tab-delimited files containing the lineage of each contig, with full lineage names + - `[assembler]-[binner]-[sample/group].bin2classification.names.txt.gz`: Taxonomy classification of the genome bins, with full lineage names +- `Taxonomy/CAT/[assembler]/[binner]/raw/` + - `[assembler]-[binner]-[sample/group].concatenated.predicted_proteins.faa.gz`: Predicted protein sequences for each genome bin, in fasta format + - `[assembler]-[binner]-[sample/group].concatenated.predicted_proteins.gff.gz`: Predicted protein features for each genome bin, in gff format + - `[assembler]-[binner]-[sample/group].ORF2LCA.txt.gz`: Tab-delimited files containing the lineage of each contig + - `[assembler]-[binner]-[sample/group].bin2classification.txt.gz`: Taxonomy classification of the genome bins + - `[assembler]-[binner]-[sample/group].log`: Log files
@@ -369,7 +458,7 @@ If the parameters `--cat_db_generate` and `--save_cat_db` are set, additionally
Output files -* `Taxonomy/CAT/CAT_prepare_*.tar.gz`: Generated and used CAT database. +- `Taxonomy/CAT/CAT_prepare_*.tar.gz`: Generated and used CAT database.
@@ -380,15 +469,15 @@ If the parameters `--cat_db_generate` and `--save_cat_db` are set, additionally
Output files -* `Taxonomy/GTDB-Tk/[assembler]/[sample/group]/` - * `gtdbtk.[assembler]-[sample/group].{bac120/ar122}.summary.tsv`: Classifications for bacterial and archaeal genomes (see the [GTDB-Tk documentation for details](https://ecogenomics.github.io/GTDBTk/files/summary.tsv.html). - * `gtdbtk.[assembler]-[sample/group].{bac120/ar122}.classify.tree.gz`: Reference tree in Newick format containing query genomes placed with pplacer. - * `gtdbtk.[assembler]-[sample/group].{bac120/ar122}.markers_summary.tsv`: A summary of unique, duplicated, and missing markers within the 120 bacterial marker set, or the 122 archaeal marker set for each submitted genome. - * `gtdbtk.[assembler]-[sample/group].{bac120/ar122}.msa.fasta.gz`: FASTA file containing MSA of submitted and reference genomes. - * `gtdbtk.[assembler]-[sample/group].{bac120/ar122}.filtered.tsv`: A list of genomes with an insufficient number of amino acids in MSA. - * `gtdbtk.[assembler]-[sample/group].*.log`: Log files. - * `gtdbtk.[assembler]-[sample/group].failed_genomes.tsv`: A list of genomes for which the GTDB-Tk analysis failed, e.g. because Prodigal could not detect any genes. -* `Taxonomy/GTDB-Tk/gtdbtk_summary.tsv`: A summary table of the GTDB-Tk classification results for all bins, also containing bins which were discarded based on the BUSCO QC, which were filtered out by GTDB-Tk ((listed in `*.filtered.tsv`) or for which the analysis failed (listed in `*.failed_genomes.tsv`). +- `Taxonomy/GTDB-Tk/[assembler]/[binner]/[sample/group]/` + - `gtdbtk.[assembler]-[binner]-[sample/group].{bac120/ar122}.summary.tsv`: Classifications for bacterial and archaeal genomes (see the [GTDB-Tk documentation for details](https://ecogenomics.github.io/GTDBTk/files/summary.tsv.html). + - `gtdbtk.[assembler]-[binner]-[sample/group].{bac120/ar122}.classify.tree.gz`: Reference tree in Newick format containing query genomes placed with pplacer. + - `gtdbtk.[assembler]-[binner]-[sample/group].{bac120/ar122}.markers_summary.tsv`: A summary of unique, duplicated, and missing markers within the 120 bacterial marker set, or the 122 archaeal marker set for each submitted genome. + - `gtdbtk.[assembler]-[binner]-[sample/group].{bac120/ar122}.msa.fasta.gz`: FASTA file containing MSA of submitted and reference genomes. + - `gtdbtk.[assembler]-[binner]-[sample/group].{bac120/ar122}.filtered.tsv`: A list of genomes with an insufficient number of amino acids in MSA. + - `gtdbtk.[assembler]-[binner]-[sample/group].*.log`: Log files. + - `gtdbtk.[assembler]-[binner]-[sample/group].failed_genomes.tsv`: A list of genomes for which the GTDB-Tk analysis failed, e.g. because Prodigal could not detect any genes. +- `Taxonomy/GTDB-Tk/gtdbtk_summary.tsv`: A summary table of the GTDB-Tk classification results for all bins, also containing bins which were discarded based on the BUSCO QC, which were filtered out by GTDB-Tk ((listed in `*.filtered.tsv`) or for which the analysis failed (listed in `*.failed_genomes.tsv`).
@@ -401,19 +490,19 @@ Whole genome annotation is the process of identifying features of interest in a
Output files -* `Prokka/[assembler]/[bin]/` - * `[bin].gff`: annotation in GFF3 format, containing both sequences and annotations - * `[bin].gbk`: annotation in GenBank format, containing both sequences and annotations - * `[bin].fna`: nucleotide FASTA file of the input contig sequences - * `[bin].faa`: protein FASTA file of the translated CDS sequences - * `[bin].ffn`: nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) - * `[bin].sqn`: an ASN1 format "Sequin" file for submission to Genbank - * `[bin].fsa`: nucleotide FASTA file of the input contig sequences, used by "tbl2asn" to create the .sqn file - * `[bin].tbl`: feature Table file, used by "tbl2asn" to create the .sqn file - * `[bin].err`: unacceptable annotations - the NCBI discrepancy report. - * `[bin].log`: contains all the output that Prokka produced during its run - * `[bin].txt`: statistics relating to the annotated features found - * `[bin].tsv`: tab-separated file of all features (locus_tag, ftype, len_bp, gene, EC_number, COG, product) +- `Prokka/[assembler]/[bin]/` + - `[bin].gff`: annotation in GFF3 format, containing both sequences and annotations + - `[bin].gbk`: annotation in GenBank format, containing both sequences and annotations + - `[bin].fna`: nucleotide FASTA file of the input contig sequences + - `[bin].faa`: protein FASTA file of the translated CDS sequences + - `[bin].ffn`: nucleotide FASTA file of all the prediction transcripts (CDS, rRNA, tRNA, tmRNA, misc_RNA) + - `[bin].sqn`: an ASN1 format "Sequin" file for submission to Genbank + - `[bin].fsa`: nucleotide FASTA file of the input contig sequences, used by "tbl2asn" to create the .sqn file + - `[bin].tbl`: feature Table file, used by "tbl2asn" to create the .sqn file + - `[bin].err`: unacceptable annotations - the NCBI discrepancy report. + - `[bin].log`: contains all the output that Prokka produced during its run + - `[bin].txt`: statistics relating to the annotated features found + - `[bin].tsv`: tab-separated file of all features (locus_tag, ftype, len_bp, gene, EC_number, COG, product)
@@ -422,7 +511,41 @@ Whole genome annotation is the process of identifying features of interest in a
Output files -* `GenomeBinning/bin_summary.tsv`: Summary of bin sequencing depths together with BUSCO, QUAST and GTDB-Tk results, if at least one of the later was generated. +- `GenomeBinning/bin_summary.tsv`: Summary of bin sequencing depths together with BUSCO, QUAST and GTDB-Tk results, if at least one of the later was generated. This will also include refined bins if `--refine_bins_dastool` binning refinement is performed. + +
+ +## Ancient DNA + +Optional, only running when parameter `-profile ancient_dna` is specified. + +### `PyDamage` + +[Pydamage](https://github.com/maxibor/pydamage), is a tool to automate the process of ancient DNA damage identification and estimation from contigs. After modelling the ancient DNA damage using the C to T transitions, Pydamage uses a likelihood ratio test to discriminate between truly ancient, and modern contigs originating from sample contamination. + +
+Output files + +- `Ancient_DNA/pydamage/analyze` + - `[sample/group]/pydamage_results/pydamage_results.csv`: PyDamage raw result tabular file in `.csv` format. Format described here: [pydamage.readthedocs.io/en/0.62/output.html](https://pydamage.readthedocs.io/en/0.62/output.html) +- `Ancient_DNA/pydamage/filter` + - `[sample/group]/pydamage_results/pydamage_results.csv`: PyDamage filtered result tabular file in `.csv` format. Format described here: [pydamage.readthedocs.io/en/0.62/output.html](https://pydamage.readthedocs.io/en/0.62/output.html) + +
+ +### `variant_calling` + +Because of aDNA damage, _de novo_ assemblers sometimes struggle to call a correct consensus on the contig sequence. To avoid this situation, the consensus is re-called with a variant calling software using the reads aligned back to the contigs + +
+Output files + +- `variant_calling/consensus` + - `[sample/group].fa`: contigs sequence with re-called consensus from read-to-contig alignment +- `variant_calling/unfiltered` + - `[sample/group].vcf.gz`: raw variant calls of the reads aligned back to the contigs. +- `variant_calling/filtered` + - `[sample/group].filtered.vcf.gz`: quality filtered variant calls of the reads aligned back to the contigs.
@@ -431,10 +554,10 @@ Whole genome annotation is the process of identifying features of interest in a
Output files -* `multiqc/` - * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. - * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. - * `multiqc_plots/`: directory containing static images from the report in various formats. +- `multiqc/` + - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + - `multiqc_plots/`: directory containing static images from the report in various formats.
@@ -447,9 +570,9 @@ Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQ
Output files -* `pipeline_info/` - * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. - * Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.tsv`. +- `pipeline_info/` + - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`. + - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
diff --git a/docs/usage.md b/docs/usage.md index a89959c8..ef7fccb3 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -20,13 +20,13 @@ This input method only works with short read data and will assign all files to t Please note the following additional requirements: -* Files names must be unique -* Valid file extensions: `.fastq.gz`, `.fq.gz` (files must be compressed) -* The path must be enclosed in quotes -* The path must have at least one `*` wildcard character -* When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs -* To run single-end data you must additionally specify `--single_end` -* If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` +- Files names must be unique +- Valid file extensions: `.fastq.gz`, `.fq.gz` (files must be compressed) +- The path must be enclosed in quotes +- The path must have at least one `*` wildcard character +- When using the pipeline with paired end data, the path must use `{1,2}` notation to specify read pairs +- To run single-end data you must additionally specify `--single_end` +- If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` ### Samplesheet input file @@ -55,14 +55,14 @@ sample2,0,data/sample2.fastq.gz,, Please note the following requirements: -* 5 comma-seperated columns -* Valid file extension: `.csv` -* Must contain the header `sample,group,short_reads_1,short_reads_2,long_reads` -* Sample IDs must be unique -* FastQ files must be compressed (`.fastq.gz`, `.fq.gz`) -* `long_reads` can only be provided in combination with paired-end short read data -* Within one samplesheet either only single-end or only paired-end reads can be specified -* If single-end reads are specified, the command line parameter `--single_end` must be specified as well +- 5 comma-seperated columns +- Valid file extension: `.csv` +- Must contain the header `sample,group,short_reads_1,short_reads_2,long_reads` +- Sample IDs must be unique +- FastQ files must be compressed (`.fastq.gz`, `.fq.gz`) +- `long_reads` can only be provided in combination with paired-end short read data +- Within one samplesheet either only single-end or only paired-end reads can be specified +- If single-end reads are specified, the command line parameter `--single_end` must be specified as well Again, by default, the group information is only used to compute co-abundances for the binning step, but not for group-wise co-assembly (see the parameter docs for [`--coassemble_group`](https://nf-co.re/mag/parameters#coassemble_group) and [`--binning_map_mode`](https://nf-co.re/mag/parameters#binning_map_mode) for more information about how this group information can be used). @@ -71,7 +71,7 @@ Again, by default, the group information is only used to compute co-abundances f The typical command for running the pipeline is as follows: ```console -nextflow run nf-core/mag --input samplesheet.csv --genome GRCh37 -profile docker +nextflow run nf-core/mag --input samplesheet.csv --outdir --genome GRCh37 -profile docker ``` This will launch the pipeline with the `docker` configuration profile. See below for more information about profiles. @@ -79,9 +79,9 @@ This will launch the pipeline with the `docker` configuration profile. See below Note that the pipeline will create the following files in your working directory: ```console -work # Directory containing the nextflow working files -results # Finished results (configurable, see below) -.nextflow_log # Log file from Nextflow +work # Directory containing the nextflow working files + # Finished results in specified location (defined with --outdir) +.nextflow_log # Log file from Nextflow # Other nextflow hidden files, eg. history of pipeline runs and old logs. ``` @@ -133,25 +133,25 @@ They are loaded in sequence, so later profiles can overwrite earlier profiles. If `-profile` is not specified, the pipeline will run locally and expect all software to be installed and available on the `PATH`. This is _not_ recommended. -* `docker` - * A generic configuration profile to be used with [Docker](https://docker.com/) -* `singularity` - * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) -* `podman` - * A generic configuration profile to be used with [Podman](https://podman.io/) -* `shifter` - * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) -* `charliecloud` - * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) -* `conda` - * A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. -* `test`, `test_hybrid`, `test_host_rm`, `test_hybrid_host_rm`, `test_busco_auto` - * Profiles with a complete configuration for automated testing - * Includes links to test data so needs no other parameters +- `docker` + - A generic configuration profile to be used with [Docker](https://docker.com/) +- `singularity` + - A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) +- `podman` + - A generic configuration profile to be used with [Podman](https://podman.io/) +- `shifter` + - A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) +- `charliecloud` + - A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) +- `conda` + - A generic configuration profile to be used with [Conda](https://conda.io/docs/). Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. +- `test`, `test_hybrid`, `test_host_rm`, `test_hybrid_host_rm`, `test_busco_auto` + - Profiles with a complete configuration for automated testing + - Includes links to test data so needs no other parameters ### `-resume` -Specify this when restarting a pipeline. Nextflow will used cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. +Specify this when restarting a pipeline. Nextflow will use cached results from any pipeline steps where the inputs are the same, continuing from where it got to previously. For input to be considered the same, not only the names must be identical but the files' contents as well. For more info about this parameter, see [this blog post](https://www.nextflow.io/blog/2019/demystifying-nextflow-resume.html). You can also supply a run name to resume a specific run: `-resume [run-name]`. Use the `nextflow log` command to show previous run names. @@ -159,8 +159,6 @@ You can also supply a run name to resume a specific run: `-resume [run-name]`. U Specify the path to a specific config file (this is a core Nextflow command). See the [nf-core website documentation](https://nf-co.re/usage/configuration) for more information. -## Custom configuration - ### Resource requests Whilst the default requirements set within the pipeline will hopefully work for most people and with most input data, you may find that you want to customise the compute resources that the pipeline requests. Each step in the pipeline has a default set of requirements for number of CPUs, memory and time. For most of the steps in the pipeline, if the job exits with any of the error codes specified [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L18) it will automatically be resubmitted with higher requests (2 x original, then 3 x original). If it still fails after the third attempt then the pipeline execution is stopped. @@ -168,11 +166,11 @@ Whilst the default requirements set within the pipeline will hopefully work for For example, if the nf-core/rnaseq pipeline is failing after multiple re-submissions of the `STAR_ALIGN` process due to an exit code of `137` this would indicate that there is an out of memory issue: ```console -[62/149eb0] NOTE: Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1) -Error executing process > 'RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)' +[62/149eb0] NOTE: Process `NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) -- Execution is retried (1) +Error executing process > 'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)' Caused by: - Process `RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) + Process `NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN (WT_REP1)` terminated with an error exit status (137) Command executed: STAR \ @@ -196,11 +194,17 @@ Work dir: Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run` ``` -To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so based on the search results the file we want is `modules/nf-core/software/star/align/main.nf`. If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. Providing you haven't set any other standard nf-core parameters to __cap__ the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. +To bypass this error you would need to find exactly which resources are set by the `STAR_ALIGN` process. The quickest way is to search for `process STAR_ALIGN` in the [nf-core/rnaseq Github repo](https://github.com/nf-core/rnaseq/search?q=process+STAR_ALIGN). +We have standardised the structure of Nextflow DSL2 pipelines such that all module files will be present in the `modules/` directory and so, based on the search results, the file we want is `modules/nf-core/software/star/align/main.nf`. +If you click on the link to that file you will notice that there is a `label` directive at the top of the module that is set to [`label process_high`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L9). +The [Nextflow `label`](https://www.nextflow.io/docs/latest/process.html#label) directive allows us to organise workflow processes in separate groups which can be referenced in a configuration file to select and configure subset of processes having similar computing requirements. +The default values for the `process_high` label are set in the pipeline's [`base.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/base.config#L33-L37) which in this case is defined as 72GB. +Providing you haven't set any other standard nf-core parameters to **cap** the [maximum resources](https://nf-co.re/usage/configuration#max-resources) used by the pipeline then we can try and bypass the `STAR_ALIGN` process failure by creating a custom config file that sets at least 72GB of memory, in this case increased to 100GB. +The custom config below can then be provided to the pipeline via the [`-c`](#-c) parameter as highlighted in previous sections. ```nextflow process { - withName: STAR_ALIGN { + withName: 'NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN' { memory = 100.GB } } @@ -208,43 +212,9 @@ process { Note, do not change number of CPUs with custom config files for the processes `spades`, `spadeshybrid` or `megahit` when specifying the parameters `--spades_fix_cpus`, `--spadeshybrid_fix_cpus` and `--megahit_fix_cpu_1` respectively. -> **NB:** We specify just the process name i.e. `STAR_ALIGN` in the config file and not the full task name string that is printed to screen in the error message or on the terminal whilst the pipeline is running i.e. `RNASEQ:ALIGN_STAR:STAR_ALIGN`. You may get a warning suggesting that the process selector isn't recognised but you can ignore that if the process name has been specified correctly. This is something that needs to be fixed upstream in core Nextflow. - -### Tool-specific options - -For the ultimate flexibility, we have implemented and are using Nextflow DSL2 modules in a way where it is possible for both developers and users to change tool-specific command-line arguments (e.g. providing an additional command-line argument to the `STAR_ALIGN` process) as well as publishing options (e.g. saving files produced by the `STAR_ALIGN` process that aren't saved by default by the pipeline). In the majority of instances, as a user you won't have to change the default options set by the pipeline developer(s), however, there may be edge cases where creating a simple custom config file can improve the behaviour of the pipeline if for example it is failing due to a weird error that requires setting a tool-specific parameter to deal with smaller / larger genomes. - -The command-line arguments passed to STAR in the `STAR_ALIGN` module are a combination of: - -* Mandatory arguments or those that need to be evaluated within the scope of the module, as supplied in the [`script`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L49-L55) section of the module file. - -* An [`options.args`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/modules/nf-core/software/star/align/main.nf#L56) string of non-mandatory parameters that is set to be empty by default in the module but can be overwritten when including the module in the sub-workflow / workflow context via the `addParams` Nextflow option. - -The nf-core/rnaseq pipeline has a sub-workflow (see [terminology](https://github.com/nf-core/modules#terminology)) specifically to align reads with STAR and to sort, index and generate some basic stats on the resulting BAM files using SAMtools. At the top of this file we import the `STAR_ALIGN` module via the Nextflow [`include`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L10) keyword and by default the options passed to the module via the `addParams` option are set as an empty Groovy map [here](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/subworkflows/nf-core/align_star.nf#L5); this in turn means `options.args` will be set to empty by default in the module file too. This is an intentional design choice and allows us to implement well-written sub-workflows composed of a chain of tools that by default run with the bare minimum parameter set for any given tool in order to make it much easier to share across pipelines and to provide the flexibility for users and developers to customise any non-mandatory arguments. - -When including the sub-workflow above in the main pipeline workflow we use the same `include` statement, however, we now have the ability to overwrite options for each of the tools in the sub-workflow including the [`align_options`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L225) variable that will be used specifically to overwrite the optional arguments passed to the `STAR_ALIGN` module. In this case, the options to be provided to `STAR_ALIGN` have been assigned sensible defaults by the developer(s) in the pipeline's [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L70-L74) and can be accessed and customised in the [workflow context](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/workflows/rnaseq.nf#L201-L204) too before eventually passing them to the sub-workflow as a Groovy map called `star_align_options`. These options will then be propagated from `workflow -> sub-workflow -> module`. - -As mentioned at the beginning of this section it may also be necessary for users to overwrite the options passed to modules to be able to customise specific aspects of the way in which a particular tool is executed by the pipeline. Given that all of the default module options are stored in the pipeline's `modules.config` as a [`params` variable](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L24-L25) it is also possible to overwrite any of these options via a custom config file. - -Say for example we want to append an additional, non-mandatory parameter (i.e. `--outFilterMismatchNmax 16`) to the arguments passed to the `STAR_ALIGN` module. Firstly, we need to copy across the default `args` specified in the [`modules.config`](https://github.com/nf-core/rnaseq/blob/4c27ef5610c87db00c3c5a3eed10b1d161abf575/conf/modules.config#L71) and create a custom config file that is a composite of the default `args` as well as the additional options you would like to provide. This is very important because Nextflow will overwrite the default value of `args` that you provide via the custom config. - -As you will see in the example below, we have: - -* appended `--outFilterMismatchNmax 16` to the default `args` used by the module. -* changed the default `publish_dir` value to where the files will eventually be published in the main results directory. -* appended `'bam':''` to the default value of `publish_files` so that the BAM files generated by the process will also be saved in the top-level results directory for the module. Note: `'out':'log'` means any file/directory ending in `out` will now be saved in a separate directory called `my_star_directory/log/`. - -```nextflow -params { - modules { - 'star_align' { - args = "--quantMode TranscriptomeSAM --twopassMode Basic --outSAMtype BAM Unsorted --readFilesCommand zcat --runRNGseed 0 --outFilterMultimapNmax 20 --alignSJDBoverhangMin 1 --outSAMattributes NH HI AS NM MD --quantTranscriptomeBan Singleend --outFilterMismatchNmax 16" - publish_dir = "my_star_directory" - publish_files = ['out':'log', 'tab':'log', 'bam':''] - } - } -} -``` +> **NB:** We specify the full process name i.e. `NFCORE_RNASEQ:RNASEQ:ALIGN_STAR:STAR_ALIGN` in the config file because this takes priority over the short name (`STAR_ALIGN`) and allows existing configuration using the full process name to be correctly overridden. +> +> If you get a warning suggesting that the process selector isn't recognised check that the process name has been specified correctly. ### Updating containers @@ -254,35 +224,35 @@ The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementatio 2. Find the latest version of the Biocontainer available on [Quay.io](https://quay.io/repository/biocontainers/pangolin?tag=latest&tab=tags) 3. Create the custom config accordingly: - * For Docker: + - For Docker: - ```nextflow - process { - withName: PANGOLIN { - container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' - } - } - ``` + ```nextflow + process { + withName: PANGOLIN { + container = 'quay.io/biocontainers/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` - * For Singularity: + - For Singularity: - ```nextflow - process { - withName: PANGOLIN { - container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' - } - } - ``` + ```nextflow + process { + withName: PANGOLIN { + container = 'https://depot.galaxyproject.org/singularity/pangolin:3.0.5--pyhdfd78af_0' + } + } + ``` - * For Conda: + - For Conda: - ```nextflow - process { - withName: PANGOLIN { - conda = 'bioconda::pangolin=3.0.5' - } - } - ``` + ```nextflow + process { + withName: PANGOLIN { + conda = 'bioconda::pangolin=3.0.5' + } + } + ``` > **NB:** If you wish to periodically update individual tool-specific results (e.g. Pangolin) generated by the pipeline then you must ensure to keep the `work/` directory otherwise the `-resume` ability of the pipeline will be compromised and it will restart from scratch. @@ -311,3 +281,23 @@ We recommend adding the following line to your environment to limit this (typica ```console NXF_OPTS='-Xms1g -Xmx4g' ``` + +## A note on the ancient DNA subworkflow + +nf-core/mag integrates an additional subworkflow to validate ancient DNA _de novo_ assembly: + +[Characteristic patterns of ancient DNA (aDNA) damage](<(https://doi.org/10.1073/pnas.0704665104)>), namely DNA fragmentation and cytosine deamination (observed as C-to-T transitions) are typically used to authenticate aDNA sequences. By identifying assembled contigs carrying typical aDNA damages using [PyDamage](https://github.com/maxibor/pydamage), nf-core/mag can report and distinguish ancient contigs from contigs carrying no aDNA damage. Furthermore, to mitigate the effect of aDNA damage on contig sequence assembly, [freebayes](https://github.com/freebayes/freebayes) in combination with [BCFtools](https://github.com/samtools/bcftools) are used to (re)call the variants from the reads aligned to the contigs, and (re)generate contig consensus sequences. + +## A note on bin refinement + +### Error Reporting + +DAS Tool may not always be able to refine bins due to insufficient recovery of enough single-copy genes. In these cases you will get a NOTE such as + +```console +[16/d330a6] NOTE: Process `NFCORE_MAG:MAG:BINNING_REFINEMENT:DASTOOL_DASTOOL (test_minigut_sample2)` terminated with an error exit status (1) -- Error is ignored +``` + +In this case, DAS Tool has not necessarily failed but was unable to complete the refinement. You will therefore not expect to find any output files in the `GenomeBinning/DASTool/` results directory for that particular sample. + +If you are regularly getting such errors, you can try reducing the `--refine_bins_dastool_threshold` value, which will modify the scoring threshold defined in the [DAS Tool publication](https://www.nature.com/articles/s41564-018-0171-1). diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy index 8d6920dd..b3d092f8 100755 --- a/lib/NfcoreSchema.groovy +++ b/lib/NfcoreSchema.groovy @@ -27,7 +27,7 @@ class NfcoreSchema { /* groovylint-disable-next-line UnusedPrivateMethodParameter */ public static void validateParameters(workflow, params, log, schema_filename='nextflow_schema.json') { def has_error = false - //=====================================================================// + //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// // Check for nextflow core params and unexpected params def json = new File(getSchemaPath(workflow, schema_filename=schema_filename)).text def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') @@ -105,9 +105,13 @@ class NfcoreSchema { // Collect expected parameters from the schema def expectedParams = [] + def enums = [:] for (group in schemaParams) { for (p in group.value['properties']) { expectedParams.push(p.key) + if (group.value['properties'][p.key].containsKey('enum')) { + enums[p.key] = group.value['properties'][p.key]['enum'] + } } } @@ -131,7 +135,7 @@ class NfcoreSchema { } } - //=====================================================================// + //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~// // Validate parameters against the schema InputStream input_stream = new File(getSchemaPath(workflow, schema_filename=schema_filename)).newInputStream() JSONObject raw_schema = new JSONObject(new JSONTokener(input_stream)) @@ -155,7 +159,7 @@ class NfcoreSchema { println '' log.error 'ERROR: Validation of pipeline parameters failed!' JSONObject exceptionJSON = e.toJSON() - printExceptions(exceptionJSON, params_json, log) + printExceptions(exceptionJSON, params_json, log, enums) println '' has_error = true } @@ -202,7 +206,7 @@ class NfcoreSchema { } def type = '[' + group_params.get(param).type + ']' def description = group_params.get(param).description - def defaultValue = group_params.get(param).default ? " [default: " + group_params.get(param).default.toString() + "]" : '' + def defaultValue = group_params.get(param).default != null ? " [default: " + group_params.get(param).default.toString() + "]" : '' def description_default = description + colors.dim + defaultValue + colors.reset // Wrap long description texts // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap @@ -260,13 +264,12 @@ class NfcoreSchema { // Get pipeline parameters defined in JSON Schema def Map params_summary = [:] - def blacklist = ['hostnames'] def params_map = paramsLoad(getSchemaPath(workflow, schema_filename=schema_filename)) for (group in params_map.keySet()) { def sub_params = new LinkedHashMap() def group_params = params_map.get(group) // This gets the parameters of that particular group for (param in group_params.keySet()) { - if (params.containsKey(param) && !blacklist.contains(param)) { + if (params.containsKey(param)) { def params_value = params.get(param) def schema_value = group_params.get(param).default def param_type = group_params.get(param).type @@ -330,7 +333,7 @@ class NfcoreSchema { // // Loop over nested exceptions and print the causingException // - private static void printExceptions(ex_json, params_json, log) { + private static void printExceptions(ex_json, params_json, log, enums, limit=5) { def causingExceptions = ex_json['causingExceptions'] if (causingExceptions.length() == 0) { def m = ex_json['message'] =~ /required key \[([^\]]+)\] not found/ @@ -346,11 +349,20 @@ class NfcoreSchema { else { def param = ex_json['pointerToViolation'] - ~/^#\// def param_val = params_json[param].toString() - log.error "* --${param}: ${ex_json['message']} (${param_val})" + if (enums.containsKey(param)) { + def error_msg = "* --${param}: '${param_val}' is not a valid choice (Available choices" + if (enums[param].size() > limit) { + log.error "${error_msg} (${limit} of ${enums[param].size()}): ${enums[param][0..limit-1].join(', ')}, ... )" + } else { + log.error "${error_msg}: ${enums[param].join(', ')})" + } + } else { + log.error "* --${param}: ${ex_json['message']} (${param_val})" + } } } for (ex in causingExceptions) { - printExceptions(ex, params_json, log) + printExceptions(ex, params_json, log, enums) } } diff --git a/lib/NfcoreTemplate.groovy b/lib/NfcoreTemplate.groovy index 169d3f23..9798cb91 100755 --- a/lib/NfcoreTemplate.groovy +++ b/lib/NfcoreTemplate.groovy @@ -19,27 +19,16 @@ class NfcoreTemplate { } // - // Check params.hostnames + // Warn if a -profile or Nextflow config has not been provided to run the pipeline // - public static void hostName(workflow, params, log) { - Map colors = logColours(params.monochrome_logs) - if (params.hostnames) { - try { - def hostname = "hostname".execute().text.trim() - params.hostnames.each { prof, hnames -> - hnames.each { hname -> - if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.info "=${colors.yellow}====================================================${colors.reset}=\n" + - "${colors.yellow}WARN: You are running with `-profile $workflow.profile`\n" + - " but your machine hostname is ${colors.white}'$hostname'${colors.reset}.\n" + - " ${colors.yellow_bold}Please use `-profile $prof${colors.reset}`\n" + - "=${colors.yellow}====================================================${colors.reset}=" - } - } - } - } catch (Exception e) { - log.warn "[$workflow.manifest.name] Could not determine 'hostname' - skipping check. Reason: ${e.message}." - } + public static void checkConfigProvided(workflow, log) { + if (workflow.profile == 'standard' && workflow.configFiles.size() <= 1) { + log.warn "[$workflow.manifest.name] You are attempting to run the pipeline without any custom configuration!\n\n" + + "This will be dependent on your local compute environment but can be achieved via one or more of the following:\n" + + " (1) Using an existing pipeline profile e.g. `-profile docker` or `-profile singularity`\n" + + " (2) Using an existing nf-core/configs for your Institution e.g. `-profile crick` or `-profile uppmax`\n" + + " (3) Using your own local custom config e.g. `-c /path/to/your/custom.config`\n\n" + + "Please refer to the quick start section and usage docs for the pipeline.\n " } } @@ -198,7 +187,6 @@ class NfcoreTemplate { log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed successfully, but with errored process(es) ${colors.reset}-" } } else { - hostName(workflow, params, log) log.info "-${colors.purple}[$workflow.manifest.name]${colors.red} Pipeline completed with errors${colors.reset}-" } } diff --git a/lib/Utils.groovy b/lib/Utils.groovy index 18173e98..28567bd7 100644 --- a/lib/Utils.groovy +++ b/lib/Utils.groovy @@ -29,19 +29,12 @@ class Utils { conda_check_failed |= !(channels.indexOf('bioconda') < channels.indexOf('defaults')) if (conda_check_failed) { - log.warn "=============================================================================\n" + + log.warn "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n" + " There is a problem with your Conda configuration!\n\n" + " You will need to set-up the conda-forge and bioconda channels correctly.\n" + " Please refer to https://bioconda.github.io/user/install.html#set-up-channels\n" + " NB: The order of the channels matters!\n" + - "===================================================================================" + "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~" } } - - // - // Join module args with appropriate spacing - // - public static String joinModuleArgs(args_list) { - return ' ' + args_list.join(' ') - } } diff --git a/lib/WorkflowMag.groovy b/lib/WorkflowMag.groovy index ce2cf932..082027d6 100755 --- a/lib/WorkflowMag.groovy +++ b/lib/WorkflowMag.groovy @@ -9,7 +9,7 @@ class WorkflowMag { // public static void initialise(params, log, hybrid) { // Check if binning mapping mode is valid - if (!['all','group','own'].contains(params.binning_map_mode)) { + if (!['all', 'group', 'own'].contains(params.binning_map_mode)) { log.error "Invalid parameter '--binning_map_mode ${params.binning_map_mode}'. Valid values are 'all', 'group' or 'own'." System.exit(1) } @@ -28,45 +28,49 @@ class WorkflowMag { System.exit(1) } // Check if settings concerning reproducibility of used tools are consistent and print warning if not - if (params.megahit_fix_cpu_1 || params.spades_fix_cpus != -1 || params.spadeshybrid_fix_cpus != -1){ - if (!params.skip_spades && params.spades_fix_cpus == -1) + if (params.megahit_fix_cpu_1 || params.spades_fix_cpus != -1 || params.spadeshybrid_fix_cpus != -1) { + if (!params.skip_spades && params.spades_fix_cpus == -1) { log.warn "At least one assembly process is run with a parameter to ensure reproducible results, but SPAdes not. Consider using the parameter '--spades_fix_cpus'." - if (hybrid && params.skip_spadeshybrid && params.spadeshybrid_fix_cpus == -1) + } + if (hybrid && params.skip_spadeshybrid && params.spadeshybrid_fix_cpus == -1) { log.warn "At least one assembly process is run with a parameter to ensure reproducible results, but SPAdes hybrid not. Consider using the parameter '--spadeshybrid_fix_cpus'." - if (!params.skip_megahit && !params.megahit_fix_cpu_1) + } + if (!params.skip_megahit && !params.megahit_fix_cpu_1) { log.warn "At least one assembly process is run with a parameter to ensure reproducible results, but MEGAHIT not. Consider using the parameter '--megahit_fix_cpu_1'." - if (!params.skip_binning && params.metabat_rng_seed == 0) + } + if (!params.skip_binning && params.metabat_rng_seed == 0) { log.warn "At least one assembly process is run with a parameter to ensure reproducible results, but for MetaBAT2 a random seed is specified ('--metabat_rng_seed 0'). Consider specifying a positive seed instead." + } } // Check if SPAdes and single_end if ( (!params.skip_spades || !params.skip_spadeshybrid) && params.single_end) { - log.warn "metaSPAdes does not support single-end data. SPAdes will be skipped." + log.warn 'metaSPAdes does not support single-end data. SPAdes will be skipped.' } // Check if parameters for host contamination removal are valid if ( params.host_fasta && params.host_genome) { - log.error "Both host fasta reference and iGenomes genome are specified to remove host contamination! Invalid combination, please specify either --host_fasta or --host_genome." + log.error 'Both host fasta reference and iGenomes genome are specified to remove host contamination! Invalid combination, please specify either --host_fasta or --host_genome.' System.exit(1) } if ( hybrid && (params.host_fasta || params.host_genome) ) { - log.warn "Host read removal is only applied to short reads. Long reads might be filtered indirectly by Filtlong, which is set to use read qualities estimated based on k-mer matches to the short, already filtered reads." + log.warn 'Host read removal is only applied to short reads. Long reads might be filtered indirectly by Filtlong, which is set to use read qualities estimated based on k-mer matches to the short, already filtered reads.' if ( params.longreads_length_weight > 1 ) { log.warn "The parameter --longreads_length_weight is ${params.longreads_length_weight}, causing the read length being more important for long read filtering than the read quality. Set --longreads_length_weight to 1 in order to assign equal weights." } } if ( params.host_genome ) { if (!params.genomes) { - log.error "No config file containing genomes provided!" + log.error 'No config file containing genomes provided!' System.exit(1) } // Check if host genome exists in the config file if (!params.genomes.containsKey(params.host_genome)) { - log.error "=============================================================================\n" + + log.error '=============================================================================\n' + " Host genome '${params.host_genome}' not found in any config files provided to the pipeline.\n" + - " Currently, the available genome keys are:\n" + - " ${params.genomes.keySet().join(", ")}\n" + - "===================================================================================" + ' Currently, the available genome keys are:\n' + + " ${params.genomes.keySet().join(', ')}\n" + + '===================================================================================' System.exit(1) } if ( !params.genomes[params.host_genome].fasta ) { @@ -79,41 +83,58 @@ class WorkflowMag { } } + // Check if at least two binners were applied in order to run DAS Tool for bin refinment + // (needs to be adjusted in case additional binners are added) + if (params.refine_bins_dastool && params.skip_metabat2 ) { + log.error 'Both --refine_bins_dastool and --skip_metabat2 are specified! Invalid combination, bin refinement requires MetaBAT2 and MaxBin2 binning results.' + System.exit(1) + } + if (params.refine_bins_dastool && params.skip_maxbin2 ) { + log.error 'Both --refine_bins_dastool and --skip_maxbin2 are specified! Invalid combination, bin refinement requires MetaBAT2 and MaxBin2 binning results.' + System.exit(1) + } + + // Check that bin refinement is actually turned on if any of the refined bins are requested for downstream + if (!params.refine_bins_dastool && params.postbinning_input != 'raw_bins_only') { + log.error 'The parameter '--postbinning_input ${params.postbinning_input}' for downstream steps can only be specified if bin refinement is activated with --refine_bins_dastool! Check input.' + System.exit(1) + } + // Check if BUSCO parameters combinations are valid - if (params.skip_busco){ + if (params.skip_busco) { if (params.busco_reference) { - log.error "Both --skip_busco and --busco_reference are specified! Invalid combination, please specify either --skip_busco or --busco_reference." + log.error 'Both --skip_busco and --busco_reference are specified! Invalid combination, please specify either --skip_busco or --busco_reference.' System.exit(1) } if (params.busco_download_path) { - log.error "Both --skip_busco and --busco_download_path are specified! Invalid combination, please specify either --skip_busco or --busco_download_path." + log.error 'Both --skip_busco and --busco_download_path are specified! Invalid combination, please specify either --skip_busco or --busco_download_path.' System.exit(1) } if (params.busco_auto_lineage_prok) { - log.error "Both --skip_busco and --busco_auto_lineage_prok are specified! Invalid combination, please specify either --skip_busco or --busco_auto_lineage_prok." + log.error 'Both --skip_busco and --busco_auto_lineage_prok are specified! Invalid combination, please specify either --skip_busco or --busco_auto_lineage_prok.' System.exit(1) } } if (params.busco_reference && params.busco_download_path) { - log.error "Both --busco_reference and --busco_download_path are specified! Invalid combination, please specify either --busco_reference or --busco_download_path." + log.error 'Both --busco_reference and --busco_download_path are specified! Invalid combination, please specify either --busco_reference or --busco_download_path.' System.exit(1) } if (params.busco_auto_lineage_prok && params.busco_reference) { - log.error "Both --busco_auto_lineage_prok and --busco_reference are specified! Invalid combination, please specify either --busco_auto_lineage_prok or --busco_reference." + log.error 'Both --busco_auto_lineage_prok and --busco_reference are specified! Invalid combination, please specify either --busco_auto_lineage_prok or --busco_reference.' System.exit(1) } if (params.skip_busco && params.gtdb) { - log.warn "--skip_busco and --gtdb are specified! GTDB-tk will be omitted because GTDB-tk bin classification requires bin filtering based on BUSCO QC results to avoid GTDB-tk errors." + log.warn '--skip_busco and --gtdb are specified! GTDB-tk will be omitted because GTDB-tk bin classification requires bin filtering based on BUSCO QC results to avoid GTDB-tk errors.' } // Check if CAT parameters are valid if (params.cat_db && params.cat_db_generate) { - log.error "Invalid combination of parameters --cat_db and --cat_db_generate is specified! Please specify either --cat_db or --cat_db_generate." + log.error 'Invalid combination of parameters --cat_db and --cat_db_generate is specified! Please specify either --cat_db or --cat_db_generate.' System.exit(1) } if (params.save_cat_db && !params.cat_db_generate) { - log.error "Invalid parameter combination: parameter --save_cat_db specified, but not --cat_db_generate! Note also that the parameter --save_cat_db does not work in combination with --cat_db." + log.error 'Invalid parameter combination: parameter --save_cat_db specified, but not --cat_db_generate! Note also that the parameter --save_cat_db does not work in combination with --cat_db.' System.exit(1) } } @@ -131,17 +152,18 @@ class WorkflowMag { for (param in group_params.keySet()) { summary_section += "
$param
${group_params.get(param) ?: 'N/A'}
\n" } - summary_section += " \n" + summary_section += ' \n' } } - String yaml_file_text = "id: '${workflow.manifest.name.replace('/','-')}-summary'\n" + String yaml_file_text = "id: '${workflow.manifest.name.replace('/', '-')}-summary'\n" yaml_file_text += "description: ' - this information is collected when the pipeline is started.'\n" yaml_file_text += "section_name: '${workflow.manifest.name} Workflow Summary'\n" yaml_file_text += "section_href: 'https://github.com/${workflow.manifest.name}'\n" yaml_file_text += "plot_type: 'html'\n" - yaml_file_text += "data: |\n" + yaml_file_text += 'data: |\n' yaml_file_text += "${summary_section}" return yaml_file_text } + } diff --git a/lib/WorkflowMain.groovy b/lib/WorkflowMain.groovy index 553994ca..e85e510c 100755 --- a/lib/WorkflowMain.groovy +++ b/lib/WorkflowMain.groovy @@ -9,8 +9,8 @@ class WorkflowMain { // public static String citation(workflow) { return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" + - "* The preprint\n" + - " https://doi.org/10.1101/2021.08.29.458094\n\n" + + "* The pipeline publication\n" + + " https://doi.org/10.1093/nargab/lqac007\n\n" + "* The pipeline\n" + " https://doi.org/10.5281/zenodo.3589527\n\n" + "* The nf-core framework\n" + @@ -62,6 +62,9 @@ class WorkflowMain { // Print parameter summary log to screen log.info paramsSummaryLog(workflow, params, log) + // Check that a -profile or Nextflow config has been provided to run the pipeline + NfcoreTemplate.checkConfigProvided(workflow, log) + // Check that conda channels are set-up correctly if (params.enable_conda) { Utils.checkCondaChannels(log) @@ -70,9 +73,6 @@ class WorkflowMain { // Check AWS batch settings NfcoreTemplate.awsBatch(workflow, params) - // Check the hostnames against configured profiles - NfcoreTemplate.hostName(workflow, params, log) - // Check input has been provided if (!params.input) { log.error "Please provide an input samplesheet to the pipeline e.g. '--input samplesheet.csv'" diff --git a/main.nf b/main.nf index 0c224ee4..7e185c79 100644 --- a/main.nf +++ b/main.nf @@ -1,8 +1,8 @@ #!/usr/bin/env nextflow /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nf-core/mag -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Github : https://github.com/nf-core/mag Website: https://nf-co.re/mag Slack : https://nfcore.slack.com/channels/mag @@ -14,15 +14,15 @@ nextflow.enable.dsl = 2 /* ======================================================================================== VALIDATE & PRINT PARAMETER SUMMARY -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ WorkflowMain.initialise(workflow, params, log) /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NAMED WORKFLOW FOR PIPELINE -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ include { MAG } from './workflows/mag' @@ -35,9 +35,9 @@ workflow NFCORE_MAG { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RUN ALL WORKFLOWS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // @@ -49,7 +49,7 @@ workflow { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ THE END -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ diff --git a/modules.json b/modules.json index 16c860ba..c1f58a4d 100644 --- a/modules.json +++ b/modules.json @@ -3,18 +3,63 @@ "homePage": "https://github.com/nf-core/mag", "repos": { "nf-core/modules": { + "adapterremoval": { + "git_sha": "f0800157544a82ae222931764483331a81812012" + }, + "bcftools/consensus": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "bcftools/index": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "bcftools/view": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "custom/dumpsoftwareversions": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "dastool/dastool": { + "git_sha": "ae48653bd2d169510580220bb62d96f830c31293" + }, + "dastool/fastatocontig2bin": { + "git_sha": "8ce68107871c96519b3eb0095d97896e34ef4489" + }, "fastp": { - "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + "git_sha": "d0a1cbb703a130c19f6796c3fce24fbe7dfce789" }, "fastqc": { - "git_sha": "e937c7950af70930d1f34bb961403d9d2aa81c7d" + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "freebayes": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "gunzip": { + "git_sha": "9aadd9a6d3f5964476582319b3a1c54a3e3fe7c9" + }, + "maxbin2": { + "git_sha": "b78a4a456762a4c59fd5023e70f36a27f76d4a97" + }, + "metabat2/jgisummarizebamcontigdepths": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "metabat2/metabat2": { + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" }, "prodigal": { - "git_sha": "49da8642876ae4d91128168cd0db4f1c858d7792" + "git_sha": "fcb1dce7b6f1563bdcdb70fdbe235cc2f9fed62e" }, "prokka": { - "git_sha": "49da8642876ae4d91128168cd0db4f1c858d7792" + "git_sha": "e745e167c1020928ef20ea1397b6b4d230681b4d" + }, + "pydamage/analyze": { + "git_sha": "64b06baa06bc41269282bc7d286af37e859ad244" + }, + "pydamage/filter": { + "git_sha": "64b06baa06bc41269282bc7d286af37e859ad244" + }, + "samtools/faidx": { + "git_sha": "1ad73f1b2abdea9398680d6d20014838135c9a35" } } } -} \ No newline at end of file +} diff --git a/modules/local/adjust_maxbin2_ext.nf b/modules/local/adjust_maxbin2_ext.nf new file mode 100644 index 00000000..63f9a211 --- /dev/null +++ b/modules/local/adjust_maxbin2_ext.nf @@ -0,0 +1,28 @@ +process ADJUST_MAXBIN2_EXT { + tag "${meta.assembler}-${meta.id}" + label 'process_low' + + // Using container from multiqc since it'll be included anyway + conda (params.enable_conda ? "bioconda::multiqc=1.12" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.12--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.12--pyhdfd78af_0' }" + + input: + tuple val(meta), path(bins) + + output: + tuple val(meta), path("*.fa.gz"), emit: renamed_bins + + script: + """ + if [ -n "${bins}" ] + then + for file in ${bins}; do + [[ \${file} =~ (.*).fasta.gz ]]; + bin="\${BASH_REMATCH[1]}" + mv \${file} \${bin}.fa.gz + done + fi + """ +} diff --git a/modules/local/bin_summary.nf b/modules/local/bin_summary.nf index d8e8de7d..f8ca4378 100644 --- a/modules/local/bin_summary.nf +++ b/modules/local/bin_summary.nf @@ -1,21 +1,9 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BIN_SUMMARY { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - conda (params.enable_conda ? "conda-forge::pandas=1.1.5" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/pandas:1.1.5" - } else { - container "quay.io/biocontainers/pandas:1.1.5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.1.5' : + 'quay.io/biocontainers/pandas:1.1.5' }" input: path(bin_depths) @@ -25,6 +13,7 @@ process BIN_SUMMARY { output: path("bin_summary.tsv"), emit: summary + path "versions.yml" , emit: versions script: def busco_summary = busco_sum.sort().size() > 0 ? "--busco_summary ${busco_sum}" : "" @@ -36,5 +25,11 @@ process BIN_SUMMARY { $quast_summary \ $gtdbtk_summary \ --out bin_summary.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + END_VERSIONS """ } diff --git a/modules/local/bowtie2_assembly_align.nf b/modules/local/bowtie2_assembly_align.nf index 329472e7..9cac97d0 100644 --- a/modules/local/bowtie2_assembly_align.nf +++ b/modules/local/bowtie2_assembly_align.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BOWTIE2_ASSEMBLY_ALIGN { tag "${assembly_meta.assembler}-${assembly_meta.id}-${reads_meta.id}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:assembly_meta, publish_by_meta:['assembler', 'id']) } - conda (params.enable_conda ? "bioconda::bowtie2=2.4.2 bioconda::samtools=1.11 conda-forge::pigz=2.3.4" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:577a697be67b5ae9b16f637fd723b8263a3898b3-0" - } else { - container "quay.io/biocontainers/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:577a697be67b5ae9b16f637fd723b8263a3898b3-0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:577a697be67b5ae9b16f637fd723b8263a3898b3-0' : + 'quay.io/biocontainers/mulled-v2-ac74a7f02cebcfcc07d8e8d1d750af9c83b4d45a:577a697be67b5ae9b16f637fd723b8263a3898b3-0' }" input: tuple val(assembly_meta), path(assembly), path(index), val(reads_meta), path(reads) @@ -24,15 +12,20 @@ process BOWTIE2_ASSEMBLY_ALIGN { output: tuple val(assembly_meta), path(assembly), path("${assembly_meta.assembler}-${assembly_meta.id}-${reads_meta.id}.bam"), path("${assembly_meta.assembler}-${assembly_meta.id}-${reads_meta.id}.bam.bai"), emit: mappings tuple val(assembly_meta), val(reads_meta), path("*.bowtie2.log") , emit: log - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' def name = "${assembly_meta.assembler}-${assembly_meta.id}-${reads_meta.id}" def input = params.single_end ? "-U \"${reads}\"" : "-1 \"${reads[0]}\" -2 \"${reads[1]}\"" """ INDEX=`find -L ./ -name "*.rev.1.bt2l" -o -name "*.rev.1.bt2" | sed 's/.rev.1.bt2l//' | sed 's/.rev.1.bt2//'` - bowtie2 -p "${task.cpus}" -x \$INDEX $input 2> "${name}.bowtie2.log" | \ + bowtie2 \\ + -p "${task.cpus}" \\ + -x \$INDEX \\ + $args \\ + $input \\ + 2> "${name}.bowtie2.log" | \ samtools view -@ "${task.cpus}" -bS | \ samtools sort -@ "${task.cpus}" -o "${name}.bam" samtools index "${name}.bam" @@ -41,6 +34,11 @@ process BOWTIE2_ASSEMBLY_ALIGN { mv "${name}.bowtie2.log" "${assembly_meta.assembler}-${assembly_meta.id}.bowtie2.log" fi - echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//' > ${software}_assembly.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' ) + END_VERSIONS """ } diff --git a/modules/local/bowtie2_assembly_build.nf b/modules/local/bowtie2_assembly_build.nf index 28cadbe3..892477c6 100644 --- a/modules/local/bowtie2_assembly_build.nf +++ b/modules/local/bowtie2_assembly_build.nf @@ -1,31 +1,27 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BOWTIE2_ASSEMBLY_BUILD { tag "${meta.assembler}-${meta.id}" conda (params.enable_conda ? 'bioconda::bowtie2=2.4.2' : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.2--py38h1c8e9b9_1' - } else { - container 'quay.io/biocontainers/bowtie2:2.4.2--py38h1c8e9b9_1' - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.2--py38h1c8e9b9_1' : + 'quay.io/biocontainers/bowtie2:2.4.2--py38h1c8e9b9_1' }" input: tuple val(meta), path(assembly) output: tuple val(meta), path(assembly), path('bt2_index_base*'), emit: assembly_index - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' """ mkdir bowtie bowtie2-build --threads $task.cpus $assembly "bt2_index_base" - bowtie2 --version > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') + END_VERSIONS """ } diff --git a/modules/local/bowtie2_removal_align.nf b/modules/local/bowtie2_removal_align.nf index 59af9e76..919111c0 100644 --- a/modules/local/bowtie2_removal_align.nf +++ b/modules/local/bowtie2_removal_align.nf @@ -1,24 +1,13 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - /* * Bowtie2 for read removal */ process BOWTIE2_REMOVAL_ALIGN { - tag "${meta.id}-${options.suffix}" - publishDir "${params.outdir}/", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } + tag "$meta.id" conda (params.enable_conda ? "bioconda::bowtie2=2.4.2" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/bowtie2:2.4.2--py38h1c8e9b9_1" - } else { - container "quay.io/biocontainers/bowtie2:2.4.2--py38h1c8e9b9_1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.2--py38h1c8e9b9_1' : + 'quay.io/biocontainers/bowtie2:2.4.2--py38h1c8e9b9_1' }" input: tuple val(meta), path(reads) @@ -28,19 +17,19 @@ process BOWTIE2_REMOVAL_ALIGN { tuple val(meta), path("*.unmapped*.fastq.gz") , emit: reads path "*.mapped*.read_ids.txt", optional:true , emit: read_ids tuple val(meta), path("*.bowtie2.log") , emit: log - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) - def prefix = options.suffix ? "${meta.id}.${options.suffix}" : "${meta.id}" - def sensitivity = params.host_removal_verysensitive ? "--very-sensitive" : "--sensitive" - def save_ids = params.host_removal_save_ids ? "Y" : "N" + def args = task.ext.args ?: '' + def args2 = task.ext.args2 ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def save_ids = (args2.contains('--host_removal_save_ids')) ? "Y" : "N" if (!meta.single_end){ """ bowtie2 -p ${task.cpus} \ -x ${index[0].getSimpleName()} \ -1 "${reads[0]}" -2 "${reads[1]}" \ - $sensitivity \ + $args \ --un-conc-gz ${prefix}.unmapped_%.fastq.gz \ --al-conc-gz ${prefix}.mapped_%.fastq.gz \ 1> /dev/null \ @@ -51,14 +40,17 @@ process BOWTIE2_REMOVAL_ALIGN { fi rm -f ${prefix}.mapped_*.fastq.gz - echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//' > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') + END_VERSIONS """ } else { """ bowtie2 -p ${task.cpus} \ -x ${index[0].getSimpleName()} \ -U ${reads} \ - $sensitivity \ + $args \ --un-gz ${prefix}.unmapped.fastq.gz \ --al-gz ${prefix}.mapped.fastq.gz \ 1> /dev/null \ @@ -68,7 +60,10 @@ process BOWTIE2_REMOVAL_ALIGN { fi rm -f ${prefix}.mapped.fastq.gz - echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//' > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') + END_VERSIONS """ } } diff --git a/modules/local/bowtie2_removal_build.nf b/modules/local/bowtie2_removal_build.nf index e3600ac8..9f8531dd 100644 --- a/modules/local/bowtie2_removal_build.nf +++ b/modules/local/bowtie2_removal_build.nf @@ -1,31 +1,27 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BOWTIE2_REMOVAL_BUILD { tag "$fasta" conda (params.enable_conda ? 'bioconda::bowtie2=2.4.2' : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.2--py38h1c8e9b9_1' - } else { - container 'quay.io/biocontainers/bowtie2:2.4.2--py38h1c8e9b9_1' - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bowtie2:2.4.2--py38h1c8e9b9_1' : + 'quay.io/biocontainers/bowtie2:2.4.2--py38h1c8e9b9_1' }" input: path fasta output: path 'bt2_index_base*', emit: index - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' """ mkdir bowtie bowtie2-build --threads $task.cpus $fasta "bt2_index_base" - bowtie2 --version > ${software}_removal.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') + END_VERSIONS """ } diff --git a/modules/local/busco.nf b/modules/local/busco.nf index 0c74d194..31a4889c 100644 --- a/modules/local/busco.nf +++ b/modules/local/busco.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BUSCO { tag "${bin}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> filename.indexOf("busco_downloads") == -1 ? saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:[]) : null } - conda (params.enable_conda ? "bioconda::busco=5.1.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/busco:5.1.0--py_1" - } else { - container "quay.io/biocontainers/busco:5.1.0--py_1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/busco:5.1.0--py_1' : + 'quay.io/biocontainers/busco:5.1.0--py_1' }" input: tuple val(meta), path(bin) @@ -33,10 +21,9 @@ process BUSCO { path("${bin}_buscos.*.fna.gz") , optional:true path("${bin}_prodigal.gff") , optional:true , emit: prodigal_genes tuple val(meta), path("${bin}_busco.failed_bin.txt") , optional:true , emit: failed_bin - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) def cp_augustus_config = "Y" if( workflow.profile.toString().indexOf("conda") != -1) cp_augustus_config = "N" @@ -190,6 +177,11 @@ process BUSCO { mv BUSCO/logs/prodigal_out.log "${bin}_prodigal.gff" fi - busco --version | sed "s/BUSCO //" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') + busco: \$(busco --version 2>&1 | sed 's/BUSCO //g') + END_VERSIONS """ } diff --git a/modules/local/busco_db_preparation.nf b/modules/local/busco_db_preparation.nf index 21a3d4f1..9d655a02 100644 --- a/modules/local/busco_db_preparation.nf +++ b/modules/local/busco_db_preparation.nf @@ -1,33 +1,27 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BUSCO_DB_PREPARATION { tag "${database.baseName}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> params.save_busco_reference ? saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) : null } - conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path database output: - path "buscodb/*", emit: db - path database + path "buscodb/*" , emit: db + path database , emit: database + path "versions.yml" , emit: versions script: """ mkdir buscodb tar -xf ${database} -C buscodb + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tar: \$(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //') + END_VERSIONS """ } diff --git a/modules/local/busco_plot.nf b/modules/local/busco_plot.nf index 973795da..42392f27 100644 --- a/modules/local/busco_plot.nf +++ b/modules/local/busco_plot.nf @@ -1,52 +1,48 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BUSCO_PLOT { - tag "${meta.assembler}-${meta.id}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:[]) } + tag "${meta.assembler}-${meta.binner}-${meta.id}" conda (params.enable_conda ? "bioconda::busco=5.1.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/busco:5.1.0--py_1" - } else { - container "quay.io/biocontainers/busco:5.1.0--py_1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/busco:5.1.0--py_1' : + 'quay.io/biocontainers/busco:5.1.0--py_1' }" input: tuple val(meta), path(summaries) output: - path("${meta.assembler}-${meta.id}.*.busco_figure.png") , optional:true, emit: png - path("${meta.assembler}-${meta.id}.*.busco_figure.R") , optional:true, emit: rscript - path '*.version.txt' , emit: version + path("${meta.assembler}-${meta.binner}-${meta.id}.*.busco_figure.png") , optional:true, emit: png + path("${meta.assembler}-${meta.binner}-${meta.id}.*.busco_figure.R") , optional:true, emit: rscript + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ if [ -n "${summaries}" ] then # replace dots in bin names within summary file names by underscores # currently (BUSCO v5.1.0) generate_plot.py does not allow further dots for sum in ${summaries}; do - [[ \${sum} =~ short_summary.([_[:alnum:]]+).([_[:alnum:]]+).${meta.assembler}-${meta.id}.(.+).txt ]]; - mode=\${BASH_REMATCH[1]} - db_name=\${BASH_REMATCH[2]} - bin="${meta.assembler}-${meta.id}.\${BASH_REMATCH[3]}" - bin_new="\${bin//./_}" - mv \${sum} short_summary.\${mode}.\${db_name}.\${bin_new}.txt + if [[ \${sum} =~ short_summary.([_[:alnum:]]+).([_[:alnum:]]+).${meta.assembler}-([_[:alnum:]]+)-${meta.id}.(.+).txt ]]; then + mode=\${BASH_REMATCH[1]} + db_name=\${BASH_REMATCH[2]} + bin="${meta.assembler}-\${BASH_REMATCH[3]}-${meta.id}.\${BASH_REMATCH[4]}" + bin_new="\${bin//./_}" + mv \${sum} short_summary.\${mode}.\${db_name}.\${bin_new}.txt + else + echo "ERROR: the summary filename \${sum} does not match the expected format 'short_summary.([_[:alnum:]]+).([_[:alnum:]]+).${meta.assembler}-([_[:alnum:]]+)-${meta.id}.(.+).txt'!" + exit 1 + fi done generate_plot.py --working_directory . - mv busco_figure.png "${meta.assembler}-${meta.id}.\${mode}.\${db_name}.busco_figure.png" - mv busco_figure.R "${meta.assembler}-${meta.id}.\${mode}.\${db_name}.busco_figure.R" + mv busco_figure.png "${meta.assembler}-${meta.binner}-${meta.id}.\${mode}.\${db_name}.busco_figure.png" + mv busco_figure.R "${meta.assembler}-${meta.binner}-${meta.id}.\${mode}.\${db_name}.busco_figure.R" fi - busco --version | sed "s/BUSCO //" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') + busco: \$(busco --version 2>&1 | sed 's/BUSCO //g') + END_VERSIONS """ } diff --git a/modules/local/busco_save_download.nf b/modules/local/busco_save_download.nf index a7fc55a5..63928f85 100644 --- a/modules/local/busco_save_download.nf +++ b/modules/local/busco_save_download.nf @@ -1,25 +1,11 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BUSCO_SAVE_DOWNLOAD { // execute sequentially to avoid artefacts when saving files for multiple busco instances maxForks 1 - // do not overwrite existing files which were already saved for other busco runs - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - overwrite: false, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path(busco_downloads) diff --git a/modules/local/busco_summary.nf b/modules/local/busco_summary.nf index 36186a8d..b10f527e 100644 --- a/modules/local/busco_summary.nf +++ b/modules/local/busco_summary.nf @@ -1,21 +1,9 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process BUSCO_SUMMARY { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - conda (params.enable_conda ? "conda-forge::pandas=1.1.5" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/pandas:1.1.5" - } else { - container "quay.io/biocontainers/pandas:1.1.5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.1.5' : + 'quay.io/biocontainers/pandas:1.1.5' }" input: path(summaries_domain) @@ -24,6 +12,7 @@ process BUSCO_SUMMARY { output: path "busco_summary.tsv", emit: summary + path "versions.yml" , emit: versions script: def auto = params.busco_reference ? "" : "-a" @@ -34,6 +23,12 @@ process BUSCO_SUMMARY { f = "-f ${failed_bins}" """ summary_busco.py $auto $ss $sd $f -o busco_summary.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + END_VERSIONS """ } diff --git a/modules/local/cat.nf b/modules/local/cat.nf index 3e98bff5..a4943c28 100644 --- a/modules/local/cat.nf +++ b/modules/local/cat.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process CAT { - tag "${meta.assembler}-${meta.id}-${db_name}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler']) } + tag "${meta.assembler}-${meta.binner}-${meta.id}-${db_name}" conda (params.enable_conda ? "bioconda::cat=4.6 bioconda::diamond=2.0.6" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0" - } else { - container "quay.io/biocontainers/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0' : + 'quay.io/biocontainers/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0' }" input: tuple val(meta), path("bins/*") @@ -29,24 +17,27 @@ process CAT { path("raw/*.predicted_proteins.gff.gz"), emit: gff path("raw/*.log") , emit: log path("raw/*.bin2classification.txt.gz"), emit: tax_classification_taxids - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ - CAT bins -b "bins/" -d database/ -t taxonomy/ -n "${task.cpus}" -s .fa --top 6 -o "${meta.assembler}-${meta.id}" --I_know_what_Im_doing - CAT add_names -i "${meta.assembler}-${meta.id}.ORF2LCA.txt" -o "${meta.assembler}-${meta.id}.ORF2LCA.names.txt" -t taxonomy/ - CAT add_names -i "${meta.assembler}-${meta.id}.bin2classification.txt" -o "${meta.assembler}-${meta.id}.bin2classification.names.txt" -t taxonomy/ + CAT bins -b "bins/" -d database/ -t taxonomy/ -n "${task.cpus}" -s .fa --top 6 -o "${meta.assembler}-${meta.binner}-${meta.id}" --I_know_what_Im_doing + CAT add_names -i "${meta.assembler}-${meta.binner}-${meta.id}.ORF2LCA.txt" -o "${meta.assembler}-${meta.binner}-${meta.id}.ORF2LCA.names.txt" -t taxonomy/ + CAT add_names -i "${meta.assembler}-${meta.binner}-${meta.id}.bin2classification.txt" -o "${meta.assembler}-${meta.binner}-${meta.id}.bin2classification.names.txt" -t taxonomy/ mkdir raw mv *.ORF2LCA.txt *.predicted_proteins.faa *.predicted_proteins.gff *.log *.bin2classification.txt raw/ - gzip "raw/${meta.assembler}-${meta.id}.ORF2LCA.txt" \ - "raw/${meta.assembler}-${meta.id}.concatenated.predicted_proteins.faa" \ - "raw/${meta.assembler}-${meta.id}.concatenated.predicted_proteins.gff" \ - "raw/${meta.assembler}-${meta.id}.bin2classification.txt" \ - "${meta.assembler}-${meta.id}.ORF2LCA.names.txt" \ - "${meta.assembler}-${meta.id}.bin2classification.names.txt" - - CAT --version | sed "s/CAT v//; s/(.*//" > ${software}.version.txt + gzip "raw/${meta.assembler}-${meta.binner}-${meta.id}.ORF2LCA.txt" \ + "raw/${meta.assembler}-${meta.binner}-${meta.id}.concatenated.predicted_proteins.faa" \ + "raw/${meta.assembler}-${meta.binner}-${meta.id}.concatenated.predicted_proteins.gff" \ + "raw/${meta.assembler}-${meta.binner}-${meta.id}.bin2classification.txt" \ + "${meta.assembler}-${meta.binner}-${meta.id}.ORF2LCA.names.txt" \ + "${meta.assembler}-${meta.binner}-${meta.id}.bin2classification.names.txt" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + CAT: \$(CAT --version | sed "s/CAT v//; s/(.*//") + diamond: \$(diamond --version 2>&1 | tail -n 1 | sed 's/^diamond version //') + END_VERSIONS """ } diff --git a/modules/local/cat_db.nf b/modules/local/cat_db.nf index 10d809bd..03f463c9 100644 --- a/modules/local/cat_db.nf +++ b/modules/local/cat_db.nf @@ -1,24 +1,17 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process CAT_DB { tag "${database.baseName}" conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path(database) output: tuple val("${database.toString().replace(".tar.gz", "")}"), path("database/*"), path("taxonomy/*"), emit: db + path "versions.yml" , emit: versions script: """ @@ -26,5 +19,10 @@ process CAT_DB { tar -xf ${database} -C catDB mv `find catDB/ -type d -name "*taxonomy*"` taxonomy/ mv `find catDB/ -type d -name "*database*"` database/ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tar: \$(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //') + END_VERSIONS """ } diff --git a/modules/local/cat_db_generate.nf b/modules/local/cat_db_generate.nf index cf0aa7b0..dc2daaf4 100644 --- a/modules/local/cat_db_generate.nf +++ b/modules/local/cat_db_generate.nf @@ -1,29 +1,16 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process CAT_DB_GENERATE { - publishDir "${params.outdir}", - mode: 'move', - saveAs: { filename -> params.save_cat_db ? saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) : null } - conda (params.enable_conda ? "bioconda::cat=4.6 bioconda::diamond=2.0.6" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0" - } else { - container "quay.io/biocontainers/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0' : + 'quay.io/biocontainers/mulled-v2-75e2a26f10cbf3629edf2d1600db3fed5ebe6e04:eae321284604f7dabbdf121e3070bda907b91266-0' }" output: tuple env(DB_NAME), path("database/*"), path("taxonomy/*"), emit: db path("CAT_prepare_*.tar.gz"), optional:true , emit: db_tar_gz - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) def save_db = params.save_cat_db ? "Y" : "N" """ CAT prepare --fresh @@ -40,6 +27,10 @@ process CAT_DB_GENERATE { tar -cf - taxonomy database | gzip > "\${DB_NAME}".tar.gz fi - CAT --version | sed "s/CAT v//; s/(.*//" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + CAT: \$(CAT --version | sed "s/CAT v//; s/(.*//") + diamond: \$(diamond --version 2>&1 | tail -n 1 | sed 's/^diamond version //') + END_VERSIONS """ } diff --git a/modules/local/centrifuge.nf b/modules/local/centrifuge.nf index e3ba6663..cacefe48 100644 --- a/modules/local/centrifuge.nf +++ b/modules/local/centrifuge.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process CENTRIFUGE { tag "${meta.id}-${db_name}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::centrifuge=1.0.4_beta" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/centrifuge:1.0.4_beta--he513fc3_5" - } else { - container "quay.io/biocontainers/centrifuge:1.0.4_beta--he513fc3_5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/centrifuge:1.0.4_beta--he513fc3_5' : + 'quay.io/biocontainers/centrifuge:1.0.4_beta--he513fc3_5' }" input: tuple val(meta), path(reads) @@ -26,10 +14,9 @@ process CENTRIFUGE { tuple val("centrifuge"), val(meta), path("results.krona"), emit: results_for_krona path "report.txt" , emit: report path "kreport.txt" , emit: kreport - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) def input = meta.single_end ? "-U \"${reads}\"" : "-1 \"${reads[0]}\" -2 \"${reads[1]}\"" """ centrifuge -x "${db_name}" \ @@ -40,6 +27,9 @@ process CENTRIFUGE { centrifuge-kreport -x "${db_name}" results.txt > kreport.txt cat results.txt | cut -f 1,3 > results.krona - centrifuge --version | sed -n 1p | sed 's/^.*centrifuge-class version //' > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + centrifuge: \$(centrifuge --version | sed -n 1p | sed 's/^.*centrifuge-class version //') + END_VERSIONS """ } diff --git a/modules/local/centrifuge_db_preparation.nf b/modules/local/centrifuge_db_preparation.nf index 3a4bebde..f1a99afe 100644 --- a/modules/local/centrifuge_db_preparation.nf +++ b/modules/local/centrifuge_db_preparation.nf @@ -1,25 +1,24 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process CENTRIFUGE_DB_PREPARATION { + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path db output: tuple val("${db.toString().replace(".tar.gz", "")}"), path("*.cf"), emit: db + path "versions.yml" , emit: versions script: """ tar -xf "${db}" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tar: \$(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //') + END_VERSIONS """ } diff --git a/modules/local/combine_tsv.nf b/modules/local/combine_tsv.nf new file mode 100644 index 00000000..4564cd29 --- /dev/null +++ b/modules/local/combine_tsv.nf @@ -0,0 +1,26 @@ +process COMBINE_TSV { + + // Using bioawk as already use that for CONVERT_DEPTHS and does same thing + conda (params.enable_conda ? "bioconda::bioawk=1.0" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bioawk:1.0--hed695b0_5' : + 'quay.io/biocontainers/bioawk:1.0--hed695b0_5' }" + + input: + path(bin_summaries) + + output: + path("*.tsv") , emit: combined + path "versions.yml", emit: versions + + script: + def prefix = task.ext.prefix ?: "bin_depths_summary_combined" + """ + bioawk '(NR == 1) || (FNR > 1)' ${bin_summaries} > ${prefix}.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bioawk: \$(bioawk --version | cut -f 3 -d ' ' ) + END_VERSIONS + """ +} diff --git a/modules/local/convert_depths.nf b/modules/local/convert_depths.nf new file mode 100644 index 00000000..a307f42f --- /dev/null +++ b/modules/local/convert_depths.nf @@ -0,0 +1,28 @@ +process CONVERT_DEPTHS { + tag "${meta.id}" + + conda (params.enable_conda ? "bioconda::bioawk=1.0" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bioawk:1.0--hed695b0_5' : + 'quay.io/biocontainers/bioawk:1.0--hed695b0_5' }" + + input: + tuple val(meta), path(fasta), path(depth) + + output: + // need to add empty val because representing reads as we dont want maxbin to calculate for us. + tuple val(meta), path(fasta), val([]), path("*_mb2_depth.txt"), emit: output + path "versions.yml" , emit: versions + + script: + def prefix = task.ext.prefix ?: "${meta.id}" + """ + gunzip $depth + bioawk -t '{ { if (NR > 1) { { print \$1, \$3 } } } }' ${depth.toString() - '.gz'} > ${prefix}_mb2_depth.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bioawk: \$(bioawk --version | cut -f 3 -d ' ' ) + END_VERSIONS + """ +} diff --git a/modules/local/filtlong.nf b/modules/local/filtlong.nf index d49abb24..bdebcff6 100644 --- a/modules/local/filtlong.nf +++ b/modules/local/filtlong.nf @@ -1,28 +1,19 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process FILTLONG { tag "$meta.id" conda (params.enable_conda ? "bioconda::filtlong=0.2.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3" - } else { - container "quay.io/biocontainers/filtlong:0.2.0--he513fc3_3" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/filtlong:0.2.0--he513fc3_3' : + 'quay.io/biocontainers/filtlong:0.2.0--he513fc3_3' }" input: tuple val(meta), path(long_reads), path(short_reads_1), path(short_reads_2) output: tuple val(meta), path("${meta.id}_lr_filtlong.fastq.gz"), emit: reads - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ filtlong \ -1 ${short_reads_1} \ @@ -33,7 +24,10 @@ process FILTLONG { --length_weight ${params.longreads_length_weight} \ ${long_reads} | gzip > ${meta.id}_lr_filtlong.fastq.gz - filtlong --version | sed -e "s/Filtlong v//g" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + filtlong: \$(filtlong --version | sed -e "s/Filtlong v//g") + END_VERSIONS """ } diff --git a/modules/local/functions.nf b/modules/local/functions.nf deleted file mode 100644 index 9701dc4b..00000000 --- a/modules/local/functions.nf +++ /dev/null @@ -1,75 +0,0 @@ -// -// Utility functions used in nf-core DSL2 module files -// - -// -// Extract name of software tool from process name using $task.process -// -def getSoftwareName(task_process) { - return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() -} - -// -// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules -// -def initOptions(Map args) { - def Map options = [:] - options.args = args.args ?: '' - options.args2 = args.args2 ?: '' - options.args3 = args.args3 ?: '' - options.publish_by_meta = args.publish_by_meta ?: [] - options.publish_dir = args.publish_dir ?: '' - options.publish_files = args.publish_files - options.suffix = args.suffix ?: '' - return options -} - -// -// Tidy up and join elements of a list to return a path string -// -def getPathFromList(path_list) { - def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries - paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes - return paths.join('/') -} - -// -// Function to save/publish module results -// -def saveFiles(Map args) { - if (!args.filename.endsWith('.version.txt')) { - def ioptions = initOptions(args.options) - def path_list = [ ioptions.publish_dir ?: args.publish_dir ] - if (ioptions.publish_by_meta) { - def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta - for (key in key_list) { - if (args.meta && key instanceof String) { - def path = key - if (args.meta.containsKey(key)) { - path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] - } - path = path instanceof String ? path : '' - path_list.add(path) - } - } - } - if (ioptions.publish_files instanceof Map) { - for (ext in ioptions.publish_files) { - if (args.filename.endsWith(ext.key)) { - def ext_list = path_list.collect() - ext_list.add(ext.value) - return "${getPathFromList(ext_list)}/$args.filename" - } - } - } else if (ioptions.publish_files == null) { - return "${getPathFromList(path_list)}/$args.filename" - } - } -} - -/* - * Check file extension - */ -def hasExtension(it, extension) { - it.toString().toLowerCase().endsWith(extension.toLowerCase()) -} diff --git a/modules/local/get_software_versions.nf b/modules/local/get_software_versions.nf deleted file mode 100644 index 15886433..00000000 --- a/modules/local/get_software_versions.nf +++ /dev/null @@ -1,33 +0,0 @@ -// Import generic module functions -include { saveFiles } from './functions' - -params.options = [:] - -process GET_SOFTWARE_VERSIONS { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:'pipeline_info', meta:[:], publish_by_meta:[]) } - - conda (params.enable_conda ? "conda-forge::python=3.8.3" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/python:3.8.3" - } else { - container "quay.io/biocontainers/python:3.8.3" - } - - cache false - - input: - path versions - - output: - path "software_versions.tsv" , emit: tsv - path 'software_versions_mqc.yaml', emit: yaml - - script: // This script is bundled with the pipeline, in nf-core/mag/bin/ - """ - echo $workflow.manifest.version > pipeline.version.txt - echo $workflow.nextflow.version > nextflow.version.txt - scrape_software_versions.py &> software_versions_mqc.yaml - """ -} diff --git a/modules/local/gtdbtk_classify.nf b/modules/local/gtdbtk_classify.nf index 2230887b..077849b8 100644 --- a/modules/local/gtdbtk_classify.nf +++ b/modules/local/gtdbtk_classify.nf @@ -1,41 +1,29 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process GTDBTK_CLASSIFY { - tag "${meta.assembler}-${meta.id}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler', 'id']) } + tag "${meta.assembler}-${meta.binner}-${meta.id}" conda (params.enable_conda ? "bioconda::gtdbtk=1.5.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/gtdbtk:1.5.0--pyhdfd78af_0" - } else { - container "quay.io/biocontainers/gtdbtk:1.5.0--pyhdfd78af_0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/gtdbtk:1.5.0--pyhdfd78af_0' : + 'quay.io/biocontainers/gtdbtk:1.5.0--pyhdfd78af_0' }" input: tuple val(meta), path("bins/*") tuple val(db_name), path("database/*") output: - path "gtdbtk.${meta.assembler}-${meta.id}.*.summary.tsv" , emit: summary - path "gtdbtk.${meta.assembler}-${meta.id}.*.classify.tree.gz" , emit: tree - path "gtdbtk.${meta.assembler}-${meta.id}.*.markers_summary.tsv", emit: markers - path "gtdbtk.${meta.assembler}-${meta.id}.*.msa.fasta.gz" , emit: msa - path "gtdbtk.${meta.assembler}-${meta.id}.*.user_msa.fasta" , emit: user_msa - path "gtdbtk.${meta.assembler}-${meta.id}.*.filtered.tsv" , emit: filtered - path "gtdbtk.${meta.assembler}-${meta.id}.log" , emit: log - path "gtdbtk.${meta.assembler}-${meta.id}.warnings.log" , emit: warnings - path "gtdbtk.${meta.assembler}-${meta.id}.failed_genomes.tsv" , emit: failed - path '*.version.txt' , emit: version + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.*.summary.tsv" , emit: summary + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.*.classify.tree.gz" , emit: tree + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.*.markers_summary.tsv", emit: markers + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.*.msa.fasta.gz" , emit: msa + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.*.user_msa.fasta" , emit: user_msa + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.*.filtered.tsv" , emit: filtered + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.log" , emit: log + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.warnings.log" , emit: warnings + path "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.failed_genomes.tsv" , emit: failed + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' def pplacer_scratch = params.gtdbtk_pplacer_scratch ? "--scratch_dir pplacer_tmp" : "" """ export GTDBTK_DATA_PATH="\${PWD}/database" @@ -43,9 +31,9 @@ process GTDBTK_CLASSIFY { mkdir pplacer_tmp fi - gtdbtk classify_wf $options.args \ + gtdbtk classify_wf $args \ --genome_dir bins \ - --prefix "gtdbtk.${meta.assembler}-${meta.id}" \ + --prefix "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}" \ --out_dir "\${PWD}" \ --cpus ${task.cpus} \ --pplacer_cpus ${params.gtdbtk_pplacer_cpus} \ @@ -53,9 +41,13 @@ process GTDBTK_CLASSIFY { --min_perc_aa ${params.gtdbtk_min_perc_aa} \ --min_af ${params.gtdbtk_min_af} - gzip "gtdbtk.${meta.assembler}-${meta.id}".*.classify.tree "gtdbtk.${meta.assembler}-${meta.id}".*.msa.fasta - mv gtdbtk.log "gtdbtk.${meta.assembler}-${meta.id}.log" - mv gtdbtk.warnings.log "gtdbtk.${meta.assembler}-${meta.id}.warnings.log" - gtdbtk --version | sed "s/gtdbtk: version //; s/ Copyright.*//" > ${software}.version.txt + gzip "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}".*.classify.tree "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}".*.msa.fasta + mv gtdbtk.log "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.log" + mv gtdbtk.warnings.log "gtdbtk.${meta.assembler}-${meta.binner}-${meta.id}.warnings.log" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gtdbtk: \$(gtdbtk --version | sed -n 1p | sed "s/gtdbtk: version //; s/ Copyright.*//") + END_VERSIONS """ } diff --git a/modules/local/gtdbtk_db_preparation.nf b/modules/local/gtdbtk_db_preparation.nf index 9815d6db..aabd1e0a 100644 --- a/modules/local/gtdbtk_db_preparation.nf +++ b/modules/local/gtdbtk_db_preparation.nf @@ -1,18 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process GTDBTK_DB_PREPARATION { tag "${database}" conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path(database) @@ -24,5 +16,10 @@ process GTDBTK_DB_PREPARATION { """ mkdir database tar -xzf ${database} -C database --strip 1 + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tar: \$(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //') + END_VERSIONS """ } diff --git a/modules/local/gtdbtk_summary.nf b/modules/local/gtdbtk_summary.nf index c7b9d5d8..331c3164 100644 --- a/modules/local/gtdbtk_summary.nf +++ b/modules/local/gtdbtk_summary.nf @@ -1,21 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process GTDBTK_SUMMARY { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } conda (params.enable_conda ? "conda-forge::pandas=1.1.5" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/pandas:1.1.5" - } else { - container "quay.io/biocontainers/pandas:1.1.5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.1.5' : + 'quay.io/biocontainers/pandas:1.1.5' }" input: path(qc_discarded_bins) @@ -25,13 +14,21 @@ process GTDBTK_SUMMARY { output: path "gtdbtk_summary.tsv", emit: summary + path "versions.yml" , emit: versions script: + def args = task.ext.args ?: '' def discarded = qc_discarded_bins.sort().size() > 0 ? "--qc_discarded_bins ${qc_discarded_bins}" : "" def summaries = gtdbtk_summaries.sort().size() > 0 ? "--summaries ${gtdbtk_summaries}" : "" def filtered = filtered_bins.sort().size() > 0 ? "--filtered_bins ${filtered_bins}" : "" def failed = failed_bins.sort().size() > 0 ? "--failed_bins ${failed_bins}" : "" """ - summary_gtdbtk.py $options.args $discarded $summaries $filtered $failed --out gtdbtk_summary.tsv + summary_gtdbtk.py $args $discarded $summaries $filtered $failed --out gtdbtk_summary.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + END_VERSIONS """ } diff --git a/modules/local/kraken2.nf b/modules/local/kraken2.nf index e35605d5..65d8f82c 100644 --- a/modules/local/kraken2.nf +++ b/modules/local/kraken2.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process KRAKEN2 { tag "${meta.id}-${db_name}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::kraken2=2.0.8_beta" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/kraken2:2.0.8_beta--pl526hc9558a2_2" - } else { - container "quay.io/biocontainers/kraken2:2.0.8_beta--pl526hc9558a2_2" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/kraken2:2.0.8_beta--pl526hc9558a2_2' : + 'quay.io/biocontainers/kraken2:2.0.8_beta--pl526hc9558a2_2' }" input: tuple val(meta), path(reads) @@ -25,10 +13,9 @@ process KRAKEN2 { output: tuple val("kraken2"), val(meta), path("results.krona"), emit: results_for_krona path "kraken2_report.txt" , emit: report - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) def input = meta.single_end ? "\"${reads}\"" : "--paired \"${reads[0]}\" \"${reads[1]}\"" """ kraken2 \ @@ -40,6 +27,9 @@ process KRAKEN2 { > kraken2.kraken cat kraken2.kraken | cut -f 2,3 > results.krona - echo \$(kraken2 --version 2>&1) | sed 's/Kraken version //; s/ Copyright.*//' > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + kraken2: \$(echo \$(kraken2 --version 2>&1) | sed 's/^.*Kraken version //' | sed 's/ Copyright.*//') + END_VERSIONS """ } diff --git a/modules/local/kraken2_db_preparation.nf b/modules/local/kraken2_db_preparation.nf index b042696f..38202110 100644 --- a/modules/local/kraken2_db_preparation.nf +++ b/modules/local/kraken2_db_preparation.nf @@ -1,22 +1,16 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process KRAKEN2_DB_PREPARATION { + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path db output: tuple val("${db.simpleName}"), path("database/*.k2d"), emit: db + path "versions.yml" , emit: versions script: """ @@ -24,5 +18,10 @@ process KRAKEN2_DB_PREPARATION { tar -xf "${db}" -C db_tmp mkdir database mv `find db_tmp/ -name "*.k2d"` database/ + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + tar: \$(tar --version 2>&1 | sed -n 1p | sed 's/tar (GNU tar) //') + END_VERSIONS """ } diff --git a/modules/local/krona.nf b/modules/local/krona.nf index 304d687c..8eafad18 100644 --- a/modules/local/krona.nf +++ b/modules/local/krona.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process KRONA { tag "${meta.classifier}-${meta.id}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['classifier', 'id']) } - conda (params.enable_conda ? "bioconda::krona=2.7.1" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/krona:2.7.1--pl526_5" - } else { - container "quay.io/biocontainers/krona:2.7.1--pl526_5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/krona:2.7.1--pl526_5' : + 'quay.io/biocontainers/krona:2.7.1--pl526_5' }" input: tuple val(meta), path(report) @@ -24,12 +12,15 @@ process KRONA { output: path "*.html" , emit: html - path '*.version.txt', emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ ktImportTaxonomy "$report" -tax taxonomy - echo \$(ktImportTaxonomy 2>&1) | sed 's/^.*KronaTools //; s/ - ktImportTaxonomy.*//' > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ktImportTaxonomy: \$(ktImportTaxonomy 2>&1 | sed -n '/KronaTools /p' | sed 's/^.*KronaTools //; s/ - ktImportTaxonomy.*//') + END_VERSIONS """ } diff --git a/modules/local/krona_db.nf b/modules/local/krona_db.nf index d4c1a3cb..0c9bf3c8 100644 --- a/modules/local/krona_db.nf +++ b/modules/local/krona_db.nf @@ -1,26 +1,21 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process KRONA_DB { conda (params.enable_conda ? "bioconda::krona=2.7.1" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/krona:2.7.1--pl526_5" - } else { - container "quay.io/biocontainers/krona:2.7.1--pl526_5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/krona:2.7.1--pl526_5' : + 'quay.io/biocontainers/krona:2.7.1--pl526_5' }" output: path("taxonomy/taxonomy.tab"), emit: db - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ ktUpdateTaxonomy.sh taxonomy - echo \$(ktImportTaxonomy 2>&1) | sed 's/^.*KronaTools //; s/ - ktImportTaxonomy.*//' > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + ktImportTaxonomy: \$(ktImportTaxonomy 2>&1 | sed -n '/KronaTools /p' | sed 's/^.*KronaTools //; s/ - ktImportTaxonomy.*//') + END_VERSIONS """ } diff --git a/modules/local/mag_depths.nf b/modules/local/mag_depths.nf index 4b333665..5a54385a 100644 --- a/modules/local/mag_depths.nf +++ b/modules/local/mag_depths.nf @@ -1,37 +1,31 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process MAG_DEPTHS { - tag "${meta.assembler}-${meta.id}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler', 'id']) } + tag "${meta.assembler}-${meta.binner}-${meta.id}" // Using container from metabat2 process, since this will be anyway already downloaded and contains biopython and pandas conda (params.enable_conda ? "bioconda::metabat2=2.15 conda-forge::python=3.6.7 conda-forge::biopython=1.74 conda-forge::pandas=1.1.5" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0" - } else { - container "quay.io/biocontainers/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0' : + 'quay.io/biocontainers/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0' }" input: - tuple val(meta), path(bins) - path(contig_depths) + tuple val(meta), path(bins), path(contig_depths) output: - tuple val(meta), path("${meta.assembler}-${meta.id}-binDepths.tsv"), emit: depths + tuple val(meta), path("${meta.assembler}-${meta.binner}-${meta.id}-binDepths.tsv"), emit: depths + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ - get_mag_depths.py --bins ${bins} \ - --depths ${contig_depths} \ - --assembly_name "${meta.assembler}-${meta.id}" \ - --out "${meta.assembler}-${meta.id}-binDepths.tsv" + get_mag_depths.py --bins ${bins} \\ + --depths ${contig_depths} \\ + --assembler ${meta.assembler} \\ + --id ${meta.id} \\ + --binner ${meta.binner} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + END_VERSIONS """ } diff --git a/modules/local/mag_depths_plot.nf b/modules/local/mag_depths_plot.nf index 46e645c4..9a6b61e7 100644 --- a/modules/local/mag_depths_plot.nf +++ b/modules/local/mag_depths_plot.nf @@ -1,35 +1,30 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process MAG_DEPTHS_PLOT { - tag "${meta.assembler}-${meta.id}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler', 'id']) } + tag "${meta.assembler}-${meta.binner}-${meta.id}" conda (params.enable_conda ? "conda-forge::python=3.9 conda-forge::pandas=1.3.0 anaconda::seaborn=0.11.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0" - } else { - container "quay.io/biocontainers/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0' : + 'quay.io/biocontainers/mulled-v2-d14219255233ee6cacc427e28a7caf8ee42e8c91:0a22c7568e4a509925048454dad9ab37fa8fe776-0' }" input: tuple val(meta), path(depths) path(sample_groups) output: - tuple val(meta), path("${meta.assembler}-${meta.id}-binDepths.heatmap.png"), emit: heatmap + tuple val(meta), path("${meta.assembler}-${meta.binner}-${meta.id}-binDepths.heatmap.png"), emit: heatmap + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ plot_mag_depths.py --bin_depths ${depths} \ --groups ${sample_groups} \ - --out "${meta.assembler}-${meta.id}-binDepths.heatmap.png" + --out "${meta.assembler}-${meta.binner}-${meta.id}-binDepths.heatmap.png" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + seaborn: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('seaborn').version)") + END_VERSIONS """ } diff --git a/modules/local/mag_depths_summary.nf b/modules/local/mag_depths_summary.nf index 3c451bd5..75b7d963 100644 --- a/modules/local/mag_depths_summary.nf +++ b/modules/local/mag_depths_summary.nf @@ -1,32 +1,27 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process MAG_DEPTHS_SUMMARY { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - conda (params.enable_conda ? "conda-forge::pandas=1.1.5" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/pandas:1.1.5" - } else { - container "quay.io/biocontainers/pandas:1.1.5" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pandas:1.1.5' : + 'quay.io/biocontainers/pandas:1.1.5' }" input: path(mag_depths) output: - path("bin_depths_summary.tsv"), emit: summary + path("${prefix}.tsv"), emit: summary + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + prefix = task.ext.prefix ?: "bin_depths_summary" """ get_mag_depths_summary.py --depths ${mag_depths} \ - --out "bin_depths_summary.tsv" + --out "${prefix}.tsv" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + END_VERSIONS """ } diff --git a/modules/local/megahit.nf b/modules/local/megahit.nf index 9163eaf3..ee4e1a6c 100644 --- a/modules/local/megahit.nf +++ b/modules/local/megahit.nf @@ -1,42 +1,33 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process MEGAHIT { tag "$meta.id" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::megahit=1.2.9" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/megahit:1.2.9--h2e03b76_1" - } else { - container "quay.io/biocontainers/megahit:1.2.9--h2e03b76_1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/megahit:1.2.9--h2e03b76_1' : + 'quay.io/biocontainers/megahit:1.2.9--h2e03b76_1' }" input: tuple val(meta), path(reads1), path(reads2) output: - tuple val(meta), path("MEGAHIT/${meta.id}.contigs.fa"), emit: assembly + tuple val(meta), path("MEGAHIT/MEGAHIT-${meta.id}.contigs.fa"), emit: assembly path "MEGAHIT/*.log" , emit: log - path "MEGAHIT/${meta.id}.contigs.fa.gz" , emit: assembly_gz - path '*.version.txt' , emit: version + path "MEGAHIT/MEGAHIT-${meta.id}.contigs.fa.gz" , emit: assembly_gz + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' def input = params.single_end ? "-r \"" + reads1.join(",") + "\"" : "-1 \"" + reads1.join(",") + "\" -2 \"" + reads2.join(",") + "\"" mem = task.memory.toBytes() if ( !params.megahit_fix_cpu_1 || task.cpus == 1 ) """ - megahit ${params.megahit_options} -t "${task.cpus}" -m $mem $input -o MEGAHIT --out-prefix "${meta.id}" - gzip -c "MEGAHIT/${meta.id}.contigs.fa" > "MEGAHIT/${meta.id}.contigs.fa.gz" + megahit $args -t "${task.cpus}" -m $mem $input -o MEGAHIT --out-prefix "MEGAHIT-${meta.id}" + gzip -c "MEGAHIT/MEGAHIT-${meta.id}.contigs.fa" > "MEGAHIT/MEGAHIT-${meta.id}.contigs.fa.gz" - megahit --version | sed "s/MEGAHIT v//" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + megahit: \$(echo \$(megahit -v 2>&1) | sed 's/MEGAHIT v//') + END_VERSIONS """ else error "ERROR: '--megahit_fix_cpu_1' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file." diff --git a/modules/local/metabat2.nf b/modules/local/metabat2.nf deleted file mode 100644 index bbec99fc..00000000 --- a/modules/local/metabat2.nf +++ /dev/null @@ -1,54 +0,0 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - -process METABAT2 { - tag "${meta.assembler}-${meta.id}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler']) } - - conda (params.enable_conda ? "bioconda::metabat2=2.15 conda-forge::python=3.6.7 conda-forge::biopython=1.74 conda-forge::pandas=1.1.5" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0" - } else { - container "quay.io/biocontainers/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0" - } - - input: - tuple val(meta), path(assembly), path(bam), path(bai) - - output: - tuple val(meta), path("MetaBAT2/*.fa") , emit: bins - path "${meta.assembler}-${meta.id}-depth.txt.gz" , emit: depths - path "MetaBAT2/discarded/*" , emit: discarded - path '*.version.txt' , emit: version - - script: - def software = getSoftwareName(task.process) - """ - OMP_NUM_THREADS=${task.cpus} jgi_summarize_bam_contig_depths --outputDepth depth.txt ${bam} - metabat2 -t "${task.cpus}" -i "${assembly}" -a depth.txt -o "MetaBAT2/${meta.assembler}-${meta.id}" -m ${params.min_contig_size} --unbinned --seed ${params.metabat_rng_seed} - - gzip depth.txt - mv depth.txt.gz "${meta.assembler}-${meta.id}-depth.txt.gz" - - # save unbinned contigs above thresholds into individual files, dump others in one file - split_fasta.py "MetaBAT2/${meta.assembler}-${meta.id}.unbinned.fa" ${params.min_length_unbinned_contigs} ${params.max_unbinned_contigs} ${params.min_contig_size} - - # delete splitted file so that it doesnt end up in following processes - rm "MetaBAT2/${meta.assembler}-${meta.id}.unbinned.fa" - - mkdir MetaBAT2/discarded - gzip "MetaBAT2/${meta.assembler}-${meta.id}.lowDepth.fa" \ - "MetaBAT2/${meta.assembler}-${meta.id}.tooShort.fa" \ - "MetaBAT2/${meta.assembler}-${meta.id}.unbinned.pooled.fa" \ - "MetaBAT2/${meta.assembler}-${meta.id}.unbinned.remaining.fa" - mv "MetaBAT2/${meta.assembler}-${meta.id}".*.fa.gz MetaBAT2/discarded/ - - echo \$(metabat2 --help 2>&1) | sed "s/^.*version 2\\://; s/ (Bioconda.*//" > ${software}.version.txt - """ -} diff --git a/modules/local/multiqc.nf b/modules/local/multiqc.nf index 6774e69a..6d8b2cbb 100644 --- a/modules/local/multiqc.nf +++ b/modules/local/multiqc.nf @@ -1,41 +1,30 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process MULTIQC { label 'process_medium' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename: filename, options: params.options, publish_dir: getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - conda (params.enable_conda ? "bioconda::multiqc=1.11" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0" - } else { - container "quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0" - } + conda (params.enable_conda ? "bioconda::multiqc=1.12" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.12--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.12--pyhdfd78af_0' }" input: path multiqc_files path mqc_custom_config path 'fastqc_raw/*' - path 'fastp/*' path 'fastqc_trimmed/*' path host_removal path 'quast*/*' path 'bowtie2log/*' path short_summary + path additional output: path "*multiqc_report.html", emit: report path "*_data" , emit: data path "*_plots" , optional:true, emit: plots - path "*.version.txt" , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' read_type = params.single_end ? "--single_end" : '' if ( params.host_fasta || params.host_genome ) { @@ -45,12 +34,20 @@ process MULTIQC { multiqc_to_custom_tsv.py ${read_type} # run multiqc using custom content file instead of original bowtie2 log files multiqc -f $custom_config_file --ignore "*.bowtie2.log" . - multiqc --version | sed -e "s/multiqc, version //g" > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) + END_VERSIONS """ } else { """ - multiqc -f $options.args . - multiqc --version | sed -e "s/multiqc, version //g" > ${software}.version.txt + multiqc -f $args . + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) + END_VERSIONS """ } } diff --git a/modules/local/nanolyse.nf b/modules/local/nanolyse.nf index 94faa56d..d7b16819 100644 --- a/modules/local/nanolyse.nf +++ b/modules/local/nanolyse.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process NANOLYSE { tag "$meta.id" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::nanolyse=1.1.0" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/nanolyse:1.1.0--py36_1" - } else { - container "quay.io/biocontainers/nanolyse:1.1.0--py36_1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/nanolyse:1.1.0--py36_1' : + 'quay.io/biocontainers/nanolyse:1.1.0--py36_1' }" input: tuple val(meta), path(reads) @@ -25,16 +13,18 @@ process NANOLYSE { output: tuple val(meta), path("${meta.id}_nanolyse.fastq.gz"), emit: reads path "${meta.id}_nanolyse.log" , emit: log - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ cat ${reads} | NanoLyse --reference $nanolyse_db | gzip > ${meta.id}_nanolyse.fastq.gz echo "NanoLyse reference: $params.lambda_reference" >${meta.id}_nanolyse.log cat ${reads} | echo "total reads before NanoLyse: \$((`wc -l`/4))" >>${meta.id}_nanolyse.log gunzip -c ${meta.id}_nanolyse.fastq.gz | echo "total reads after NanoLyse: \$((`wc -l`/4))" >> ${meta.id}_nanolyse.log - NanoLyse --version | sed -e "s/NanoLyse //g" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + NanoLyse: \$(NanoLyse --version | sed -e "s/NanoLyse //g") + END_VERSIONS """ } diff --git a/modules/local/nanoplot.nf b/modules/local/nanoplot.nf index b966767c..f746e8c5 100644 --- a/modules/local/nanoplot.nf +++ b/modules/local/nanoplot.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process NANOPLOT { tag "$meta.id" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::nanoplot=1.26.3" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/nanoplot:1.26.3--py_0" - } else { - container "quay.io/biocontainers/nanoplot:1.26.3--py_0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/nanoplot:1.26.3--py_0' : + 'quay.io/biocontainers/nanoplot:1.26.3--py_0' }" input: tuple val(meta), path(reads) @@ -25,18 +13,21 @@ process NANOPLOT { path '*.png' , emit: png path '*.html' , emit: html path '*.txt' , emit: txt - path '*.version.txt', emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) - def prefix = options.suffix ? "-p ${options.suffix}_" : '' - def title = options.suffix ? "${meta.id}_${options.suffix}" : "${meta.id}" + def prefix = task.ext.prefix ? "-p ${task.ext.prefix}_" : '' + def title = task.ext.prefix ? "${meta.id}_${task.ext.prefix}" : "${meta.id}" """ NanoPlot -t ${task.cpus} \ ${prefix} \ --title ${title} \ -c darkblue \ --fastq ${reads} - NanoPlot --version | sed -e "s/NanoPlot //g" > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + NanoPlot: \$(NanoPlot --version | sed -e "s/NanoPlot //g") + END_VERSIONS """ } diff --git a/modules/local/pool_paired_reads.nf b/modules/local/pool_paired_reads.nf index 9a9b6b53..27c5e0f3 100644 --- a/modules/local/pool_paired_reads.nf +++ b/modules/local/pool_paired_reads.nf @@ -1,28 +1,26 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process POOL_PAIRED_READS { tag "$meta.id" conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: tuple val(meta), path(reads1), path(reads2) output: tuple val(meta), path("pooled_${meta.id}_*.fastq.gz"), emit: reads + path "versions.yml" , emit: versions script: """ cat ${reads1} > "pooled_${meta.id}_1.fastq.gz" cat ${reads2} > "pooled_${meta.id}_2.fastq.gz" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(cat --version 2>&1 | sed -n 1p | sed 's/cat (GNU coreutils) //') + END_VERSIONS """ } diff --git a/modules/local/pool_single_reads.nf b/modules/local/pool_single_reads.nf index f091afd0..cc633d4a 100644 --- a/modules/local/pool_single_reads.nf +++ b/modules/local/pool_single_reads.nf @@ -1,27 +1,25 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process POOL_SINGLE_READS { tag "$meta.id" conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: tuple val(meta), path(reads) output: tuple val(meta), path("pooled_${meta.id}.fastq.gz"), emit: reads + path "versions.yml" , emit: versions script: """ cat ${reads} > "pooled_${meta.id}.fastq.gz" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + cat: \$(cat --version 2>&1 | sed -n 1p | sed 's/cat (GNU coreutils) //') + END_VERSIONS """ } diff --git a/modules/local/porechop.nf b/modules/local/porechop.nf index c5a3320e..f53f7cad 100644 --- a/modules/local/porechop.nf +++ b/modules/local/porechop.nf @@ -1,30 +1,25 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process PORECHOP { tag "$meta.id" conda (params.enable_conda ? "bioconda::porechop=0.2.3_seqan2.1.1" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/porechop:0.2.3_seqan2.1.1--py36h2d50403_3" - } else { - container "quay.io/biocontainers/porechop:0.2.3_seqan2.1.1--py36h2d50403_3" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/porechop:0.2.3_seqan2.1.1--py36h2d50403_3' : + 'quay.io/biocontainers/porechop:0.2.3_seqan2.1.1--py36h2d50403_3' }" input: tuple val(meta), path(reads) output: tuple val(meta), path("${meta.id}_porechop.fastq") , emit: reads - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ porechop -i ${reads} -t ${task.cpus} -o ${meta.id}_porechop.fastq - porechop --version > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + porechop: \$(porechop --version) + END_VERSIONS """ } diff --git a/modules/local/quast.nf b/modules/local/quast.nf index 6c7de84a..8a4b3fab 100644 --- a/modules/local/quast.nf +++ b/modules/local/quast.nf @@ -1,34 +1,26 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process QUAST { tag "${meta.assembler}-${meta.id}" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler', 'id']) } - conda (params.enable_conda ? "bioconda::quast=5.0.2" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/quast:5.0.2--py37pl526hb5aa323_2" - } else { - container "quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/quast:5.0.2--py37pl526hb5aa323_2' : + 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2' }" input: tuple val(meta), path(assembly) output: - path "QUAST/*" , emit: qc - path '*.version.txt' , emit: version + path "QUAST/*" , emit: qc + path "versions.yml", emit: versions script: - def software = getSoftwareName(task.process) """ metaquast.py --threads "${task.cpus}" --rna-finding --max-ref-number 0 -l "${meta.assembler}-${meta.id}" "${assembly}" -o "QUAST" - metaquast.py --version | sed "s/QUAST v//; s/ (MetaQUAST mode)//" > ${software}.version.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + metaquast: \$(metaquast.py --version | sed "s/QUAST v//; s/ (MetaQUAST mode)//") + END_VERSIONS """ } diff --git a/modules/local/quast_bins.nf b/modules/local/quast_bins.nf index 20bb4b15..a7e7d1b3 100644 --- a/modules/local/quast_bins.nf +++ b/modules/local/quast_bins.nf @@ -1,22 +1,10 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process QUAST_BINS { - tag "${meta.assembler}-${meta.id}" - - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['assembler', 'id']) } + tag "${meta.assembler}-${meta.binner}-${meta.id}" conda (params.enable_conda ? "bioconda::quast=5.0.2" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/quast:5.0.2--py37pl526hb5aa323_2" - } else { - container "quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/quast:5.0.2--py37pl526hb5aa323_2' : + 'quay.io/biocontainers/quast:5.0.2--py37pl526hb5aa323_2' }" input: tuple val(meta), path(bins) @@ -24,22 +12,25 @@ process QUAST_BINS { output: path "QUAST/*", type: 'dir' path "QUAST/*-quast_summary.tsv", emit: quast_bin_summaries - path '*.version.txt' , emit: version + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) """ BINS=\$(echo \"$bins\" | sed 's/[][]//g') IFS=', ' read -r -a bins <<< \"\$BINS\" for bin in \"\${bins[@]}\"; do metaquast.py --threads "${task.cpus}" --max-ref-number 0 --rna-finding --gene-finding -l "\${bin}" "\${bin}" -o "QUAST/\${bin}" - if ! [ -f "QUAST/${meta.assembler}-${meta.id}-quast_summary.tsv" ]; then - cp "QUAST/\${bin}/transposed_report.tsv" "QUAST/${meta.assembler}-${meta.id}-quast_summary.tsv" + if ! [ -f "QUAST/${meta.assembler}-${meta.binner}-${meta.id}-quast_summary.tsv" ]; then + cp "QUAST/\${bin}/transposed_report.tsv" "QUAST/${meta.assembler}-${meta.binner}-${meta.id}-quast_summary.tsv" else - tail -n +2 "QUAST/\${bin}/transposed_report.tsv" >> "QUAST/${meta.assembler}-${meta.id}-quast_summary.tsv" + tail -n +2 "QUAST/\${bin}/transposed_report.tsv" >> "QUAST/${meta.assembler}-${meta.binner}-${meta.id}-quast_summary.tsv" fi done - metaquast.py --version | sed "s/QUAST v//; s/ (MetaQUAST mode)//" > ${software}_bins.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + metaquast: \$(metaquast.py --version | sed "s/QUAST v//; s/ (MetaQUAST mode)//") + END_VERSIONS """ } diff --git a/modules/local/quast_bins_summary.nf b/modules/local/quast_bins_summary.nf index 3fbf3f01..eae5ef78 100644 --- a/modules/local/quast_bins_summary.nf +++ b/modules/local/quast_bins_summary.nf @@ -1,27 +1,16 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process QUAST_BINS_SUMMARY { - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:[:], publish_by_meta:[]) } - conda (params.enable_conda ? "conda-forge::sed=4.7" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://containers.biocontainers.pro/s3/SingImgsRepo/biocontainers/v1.2.0_cv1/biocontainers_v1.2.0_cv1.img" - } else { - container "biocontainers/biocontainers:v1.2.0_cv1" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" input: path(summaries) output: path("quast_summary.tsv"), emit: summary + path "versions.yml" , emit: versions script: """ @@ -34,5 +23,10 @@ process QUAST_BINS_SUMMARY { tail -n +2 "\${quast_file}" >> "quast_summary.tsv" fi done + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //') + END_VERSIONS """ } diff --git a/modules/local/rename_postdastool.nf b/modules/local/rename_postdastool.nf new file mode 100644 index 00000000..7c746d01 --- /dev/null +++ b/modules/local/rename_postdastool.nf @@ -0,0 +1,24 @@ +process RENAME_POSTDASTOOL { + tag "${meta.assembler}-${meta.id}" + label 'process_low' + + // Using container from multiqc since it'll be included anyway + conda (params.enable_conda ? "bioconda::multiqc=1.12" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.12--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.12--pyhdfd78af_0' }" + + input: + tuple val(meta), path(bins) + + output: + tuple val(meta), path("${meta.assembler}-*Refined-${meta.id}.*.fa", includeInputs: true), optional:true, emit: refined_bins + tuple val(meta), path("${meta.assembler}-DASToolUnbinned-${meta.id}.fa"), optional:true, emit: refined_unbins + + script: + """ + if [[ -f unbinned.fa ]]; then + mv unbinned.fa ${meta.assembler}-DASToolUnbinned-${meta.id}.fa + fi + """ +} diff --git a/modules/local/rename_predastool.nf b/modules/local/rename_predastool.nf new file mode 100644 index 00000000..2250de79 --- /dev/null +++ b/modules/local/rename_predastool.nf @@ -0,0 +1,32 @@ +process RENAME_PREDASTOOL { + tag "${meta.assembler}-${meta.binner}-${meta.id}" + label 'process_low' + + // Using container from multiqc since it'll be included anyway + conda (params.enable_conda ? "bioconda::multiqc=1.12" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.12--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.12--pyhdfd78af_0' }" + + input: + tuple val(meta), path(bins) + + output: + tuple val(meta), path("${meta.assembler}-${meta.binner}Refined-${meta.id}*"), emit: renamed_bins + + script: + """ + if [ -n "${bins}" ] + then + for bin in ${bins}; do + if [[ \${bin} =~ ${meta.assembler}-${meta.binner}-${meta.id}.([_[:alnum:]]+).fa ]]; then + num=\${BASH_REMATCH[1]} + mv \${bin} ${meta.assembler}-${meta.binner}Refined-${meta.id}.\${num}.fa + else + echo "ERROR: the bin filename \${bin} does not match the expected format '${meta.assembler}-${meta.binner}-${meta.id}.([_[:alnum:]]+).fa'!" + exit 1 + fi + done + fi + """ +} diff --git a/modules/local/spades.nf b/modules/local/spades.nf index 4c380dba..9ac93dce 100644 --- a/modules/local/spades.nf +++ b/modules/local/spades.nf @@ -1,55 +1,47 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process SPADES { tag "$meta.id" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::spades=3.15.3" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/spades:3.15.3--h95f258a_0" - } else { - container "quay.io/biocontainers/spades:3.15.3--h95f258a_0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/spades:3.15.3--h95f258a_0' : + 'quay.io/biocontainers/spades:3.15.3--h95f258a_0' }" input: tuple val(meta), path(reads) output: - tuple val(meta), path("${meta.id}_scaffolds.fasta"), emit: assembly - path "${meta.id}.log" , emit: log - path "${meta.id}_contigs.fasta.gz" , emit: contigs_gz - path "${meta.id}_scaffolds.fasta.gz" , emit: assembly_gz - path "${meta.id}_graph.gfa.gz" , emit: graph - path '*.version.txt' , emit: version + tuple val(meta), path("SPAdes-${meta.id}_scaffolds.fasta"), emit: assembly + path "SPAdes-${meta.id}.log" , emit: log + path "SPAdes-${meta.id}_contigs.fasta.gz" , emit: contigs_gz + path "SPAdes-${meta.id}_scaffolds.fasta.gz" , emit: assembly_gz + path "SPAdes-${meta.id}_graph.gfa.gz" , emit: graph + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' maxmem = task.memory.toGiga() if ( params.spades_fix_cpus == -1 || task.cpus == params.spades_fix_cpus ) """ metaspades.py \ - ${params.spades_options} \ + $args \ --threads "${task.cpus}" \ --memory $maxmem \ --pe1-1 ${reads[0]} \ --pe1-2 ${reads[1]} \ -o spades - mv spades/assembly_graph_with_scaffolds.gfa ${meta.id}_graph.gfa - mv spades/scaffolds.fasta ${meta.id}_scaffolds.fasta - mv spades/contigs.fasta ${meta.id}_contigs.fasta - mv spades/spades.log ${meta.id}.log - gzip "${meta.id}_contigs.fasta" - gzip "${meta.id}_graph.gfa" - gzip -c "${meta.id}_scaffolds.fasta" > "${meta.id}_scaffolds.fasta.gz" - - metaspades.py --version | sed "s/SPAdes v//; s/ \\[.*//" > ${software}.version.txt + mv spades/assembly_graph_with_scaffolds.gfa SPAdes-${meta.id}_graph.gfa + mv spades/scaffolds.fasta SPAdes-${meta.id}_scaffolds.fasta + mv spades/contigs.fasta SPAdes-${meta.id}_contigs.fasta + mv spades/spades.log SPAdes-${meta.id}.log + gzip "SPAdes-${meta.id}_contigs.fasta" + gzip "SPAdes-${meta.id}_graph.gfa" + gzip -c "SPAdes-${meta.id}_scaffolds.fasta" > "SPAdes-${meta.id}_scaffolds.fasta.gz" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + metaspades: \$(metaspades.py --version | sed "s/SPAdes genome assembler v//; s/ \\[.*//") + END_VERSIONS """ else error "ERROR: '--spades_fix_cpus' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file." diff --git a/modules/local/spadeshybrid.nf b/modules/local/spadeshybrid.nf index 19a1a943..29cf83b4 100644 --- a/modules/local/spadeshybrid.nf +++ b/modules/local/spadeshybrid.nf @@ -1,56 +1,48 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process SPADESHYBRID { tag "$meta.id" - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::spades=3.15.3" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/spades:3.15.3--h95f258a_0" - } else { - container "quay.io/biocontainers/spades:3.15.3--h95f258a_0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/spades:3.15.3--h95f258a_0' : + 'quay.io/biocontainers/spades:3.15.3--h95f258a_0' }" input: tuple val(meta), path(long_reads), path(short_reads) output: - tuple val(meta), path("${meta.id}_scaffolds.fasta"), emit: assembly - path "${meta.id}.log" , emit: log - path "${meta.id}_contigs.fasta.gz" , emit: contigs_gz - path "${meta.id}_scaffolds.fasta.gz" , emit: assembly_gz - path "${meta.id}_graph.gfa.gz" , emit: graph - path '*.version.txt' , emit: version + tuple val(meta), path("SPAdesHybrid-${meta.id}_scaffolds.fasta"), emit: assembly + path "SPAdesHybrid-${meta.id}.log" , emit: log + path "SPAdesHybrid-${meta.id}_contigs.fasta.gz" , emit: contigs_gz + path "SPAdesHybrid-${meta.id}_scaffolds.fasta.gz" , emit: assembly_gz + path "SPAdesHybrid-${meta.id}_graph.gfa.gz" , emit: graph + path "versions.yml" , emit: versions script: - def software = getSoftwareName(task.process) + def args = task.ext.args ?: '' maxmem = task.memory.toGiga() if ( params.spadeshybrid_fix_cpus == -1 || task.cpus == params.spadeshybrid_fix_cpus ) """ metaspades.py \ - ${params.spades_options} \ + $args \ --threads "${task.cpus}" \ --memory $maxmem \ --pe1-1 ${short_reads[0]} \ --pe1-2 ${short_reads[1]} \ --nanopore ${long_reads} \ -o spades - mv spades/assembly_graph_with_scaffolds.gfa ${meta.id}_graph.gfa - mv spades/scaffolds.fasta ${meta.id}_scaffolds.fasta - mv spades/contigs.fasta ${meta.id}_contigs.fasta - mv spades/spades.log ${meta.id}.log - gzip "${meta.id}_contigs.fasta" - gzip "${meta.id}_graph.gfa" - gzip -c "${meta.id}_scaffolds.fasta" > "${meta.id}_scaffolds.fasta.gz" - - metaspades.py --version | sed "s/SPAdes v//; s/ \\[.*//" > ${software}.version.txt + mv spades/assembly_graph_with_scaffolds.gfa SPAdesHybrid-${meta.id}_graph.gfa + mv spades/scaffolds.fasta SPAdesHybrid-${meta.id}_scaffolds.fasta + mv spades/contigs.fasta SPAdesHybrid-${meta.id}_contigs.fasta + mv spades/spades.log SPAdesHybrid-${meta.id}.log + gzip "SPAdesHybrid-${meta.id}_contigs.fasta" + gzip "SPAdesHybrid-${meta.id}_graph.gfa" + gzip -c "SPAdesHybrid-${meta.id}_scaffolds.fasta" > "SPAdesHybrid-${meta.id}_scaffolds.fasta.gz" + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + metaspades: \$(metaspades.py --version | sed "s/SPAdes genome assembler v//; s/ \\[.*//") + END_VERSIONS """ else error "ERROR: '--spadeshybrid_fix_cpus' was specified, but not succesfully applied. Likely this is caused by changed process properties in a custom config file." diff --git a/modules/local/split_fasta.nf b/modules/local/split_fasta.nf new file mode 100644 index 00000000..841c08ce --- /dev/null +++ b/modules/local/split_fasta.nf @@ -0,0 +1,34 @@ +process SPLIT_FASTA { + tag "${meta.assembler}-${meta.binner}-${meta.id}" + label 'process_low' + + // Using container from metabat2 process, since this will be anyway already downloaded and contains biopython and pandas + conda (params.enable_conda ? "bioconda::metabat2=2.15 conda-forge::python=3.6.7 conda-forge::biopython=1.74 conda-forge::pandas=1.1.5" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0' : + 'quay.io/biocontainers/mulled-v2-e25d1fa2bb6cbacd47a4f8b2308bd01ba38c5dd7:75310f02364a762e6ba5206fcd11d7529534ed6e-0' }" + + input: + tuple val(meta), path(unbinned) + + output: + tuple val(meta), path("${meta.assembler}-${meta.binner}-${meta.id}.*.[1-9]*.fa.gz") , optional:true, emit: unbinned + tuple val(meta), path("${meta.assembler}-${meta.binner}-${meta.id}.*.pooled.fa.gz") , optional:true, emit: pooled + tuple val(meta), path("${meta.assembler}-${meta.binner}-${meta.id}.*.remaining.fa.gz"), optional:true, emit: remaining + path "versions.yml" , emit: versions + + script: + """ + # save unbinned contigs above thresholds into individual files, dump others in one file + split_fasta.py $unbinned ${params.min_length_unbinned_contigs} ${params.max_unbinned_contigs} ${params.min_contig_size} + + gzip *.fa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + python: \$(python --version 2>&1 | sed 's/Python //g') + biopython: 1.7.4 + pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/adapterremoval/main.nf b/modules/nf-core/modules/adapterremoval/main.nf new file mode 100644 index 00000000..9d16b9c9 --- /dev/null +++ b/modules/nf-core/modules/adapterremoval/main.nf @@ -0,0 +1,70 @@ +process ADAPTERREMOVAL { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::adapterremoval=2.3.2" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/adapterremoval:2.3.2--hb7ba0dd_0' : + 'quay.io/biocontainers/adapterremoval:2.3.2--hb7ba0dd_0' }" + + input: + tuple val(meta), path(reads) + path(adapterlist) + + output: + tuple val(meta), path("${prefix}.truncated.gz") , optional: true, emit: singles_truncated + tuple val(meta), path("${prefix}.discarded.gz") , optional: true, emit: discarded + tuple val(meta), path("${prefix}.pair1.truncated.gz") , optional: true, emit: pair1_truncated + tuple val(meta), path("${prefix}.pair2.truncated.gz") , optional: true, emit: pair2_truncated + tuple val(meta), path("${prefix}.collapsed.gz") , optional: true, emit: collapsed + tuple val(meta), path("${prefix}.collapsed.truncated.gz") , optional: true, emit: collapsed_truncated + tuple val(meta), path("${prefix}.paired.gz") , optional: true, emit: paired_interleaved + tuple val(meta), path('*.log') , emit: log + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def list = adapterlist ? "--adapter-list ${adapterlist}" : "" + prefix = task.ext.prefix ?: "${meta.id}" + + if (meta.single_end) { + """ + AdapterRemoval \\ + --file1 $reads \\ + $args \\ + $adapterlist \\ + --basename ${prefix} \\ + --threads ${task.cpus} \\ + --settings ${prefix}.log \\ + --seed 42 \\ + --gzip + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + adapterremoval: \$(AdapterRemoval --version 2>&1 | sed -e "s/AdapterRemoval ver. //g") + END_VERSIONS + """ + } else { + """ + AdapterRemoval \\ + --file1 ${reads[0]} \\ + --file2 ${reads[1]} \\ + $args \\ + $adapterlist \\ + --basename ${prefix} \\ + --threads $task.cpus \\ + --settings ${prefix}.log \\ + --seed 42 \\ + --gzip + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + adapterremoval: \$(AdapterRemoval --version 2>&1 | sed -e "s/AdapterRemoval ver. //g") + END_VERSIONS + """ + } + +} diff --git a/modules/nf-core/modules/adapterremoval/meta.yml b/modules/nf-core/modules/adapterremoval/meta.yml new file mode 100644 index 00000000..5faad043 --- /dev/null +++ b/modules/nf-core/modules/adapterremoval/meta.yml @@ -0,0 +1,90 @@ +name: adapterremoval +description: Trim sequencing adapters and collapse overlapping reads +keywords: + - trimming + - adapters + - merging + - fastq +tools: + - adapterremoval: + description: The AdapterRemoval v2 tool for merging and clipping reads. + homepage: https://github.com/MikkelSchubert/adapterremoval + documentation: https://adapterremoval.readthedocs.io + licence: ["GPL v3"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. + pattern: "*.{fq,fastq,fq.gz,fastq.gz}" + - adapterlist: + type: file + description: Optional text file containing list of adapters to look for for removal + with one adapter per line. Otherwise will look for default adapters (see + AdapterRemoval man page), or can be modified to remove user-specified + adapters via ext.args. + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - singles_truncated: + type: file + description: | + Adapter trimmed FastQ files of either single-end reads, or singleton + 'orphaned' reads from merging of paired-end data (i.e., one of the pair + was lost due to filtering thresholds). + pattern: "*.truncated.gz" + - discarded: + type: file + description: | + Adapter trimmed FastQ files of reads that did not pass filtering + thresholds. + pattern: "*.discarded.gz" + - pair1_truncated: + type: file + description: | + Adapter trimmed R1 FastQ files of paired-end reads that did not merge + with their respective R2 pair due to long templates. The respective pair + is stored in 'pair2_truncated'. + pattern: "*.pair1.truncated.gz" + - pair2_truncated: + type: file + description: | + Adapter trimmed R2 FastQ files of paired-end reads that did not merge + with their respective R1 pair due to long templates. The respective pair + is stored in 'pair1_truncated'. + pattern: "*.pair2.truncated.gz" + - collapsed: + type: file + description: | + Collapsed FastQ of paired-end reads that successfully merged with their + respective R1 pair but were not trimmed. + pattern: "*.collapsed.gz" + - collapsed_truncated: + type: file + description: | + Collapsed FastQ of paired-end reads that successfully merged with their + respective R1 pair and were trimmed of adapter due to sufficient overlap. + pattern: "*.collapsed.truncated.gz" + - log: + type: file + description: AdapterRemoval log file + pattern: "*.log" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@maxibor" + - "@jfy133" diff --git a/modules/nf-core/modules/bcftools/consensus/main.nf b/modules/nf-core/modules/bcftools/consensus/main.nf new file mode 100644 index 00000000..a0c436e2 --- /dev/null +++ b/modules/nf-core/modules/bcftools/consensus/main.nf @@ -0,0 +1,36 @@ +process BCFTOOLS_CONSENSUS { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? 'bioconda::bcftools=1.14' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.14--h88f3f91_0' : + 'quay.io/biocontainers/bcftools:1.14--h88f3f91_0' }" + + input: + tuple val(meta), path(vcf), path(tbi), path(fasta) + + output: + tuple val(meta), path('*.fa'), emit: fasta + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + cat $fasta \\ + | bcftools \\ + consensus \\ + $vcf \\ + $args \\ + > ${prefix}.fa + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/bcftools/consensus/meta.yml b/modules/nf-core/modules/bcftools/consensus/meta.yml new file mode 100644 index 00000000..05a93a56 --- /dev/null +++ b/modules/nf-core/modules/bcftools/consensus/meta.yml @@ -0,0 +1,49 @@ +name: bcftools_consensus +description: Compresses VCF files +keywords: + - variant calling + - consensus + - VCF +tools: + - consensus: + description: | + Create consensus sequence by applying VCF variants to a reference fasta file. + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: VCF file + pattern: "*.{vcf}" + - tbi: + type: file + description: tabix index file + pattern: "*.{tbi}" + - fasta: + type: file + description: FASTA reference file + pattern: "*.{fasta,fa}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: FASTA reference consensus file + pattern: "*.{fasta,fa}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" diff --git a/modules/nf-core/modules/bcftools/index/main.nf b/modules/nf-core/modules/bcftools/index/main.nf new file mode 100644 index 00000000..548a9277 --- /dev/null +++ b/modules/nf-core/modules/bcftools/index/main.nf @@ -0,0 +1,37 @@ +process BCFTOOLS_INDEX { + tag "$meta.id" + label 'process_low' + + conda (params.enable_conda ? 'bioconda::bcftools=1.14' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.14--h88f3f91_0' : + 'quay.io/biocontainers/bcftools:1.14--h88f3f91_0' }" + + input: + tuple val(meta), path(vcf) + + output: + tuple val(meta), path("*.csi"), optional:true, emit: csi + tuple val(meta), path("*.tbi"), optional:true, emit: tbi + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + + """ + bcftools \\ + index \\ + $args \\ + --threads $task.cpus \\ + $vcf + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/bcftools/index/meta.yml b/modules/nf-core/modules/bcftools/index/meta.yml new file mode 100644 index 00000000..b883fa5f --- /dev/null +++ b/modules/nf-core/modules/bcftools/index/meta.yml @@ -0,0 +1,49 @@ +name: bcftools_index +description: Index VCF tools +keywords: + - vcf + - index + - bcftools + - csi + - tbi +tools: + - bcftools: + description: BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed. Most commands accept VCF, bgzipped VCF and BCF with filetype detected automatically even when streaming from a pipe. Indexed VCF and BCF will work in all situations. Un-indexed VCF and BCF and streams will work in most, but not all situations. + homepage: https://samtools.github.io/bcftools/ + documentation: https://samtools.github.io/bcftools/howtos/index.html + tool_dev_url: https://github.com/samtools/bcftools + doi: "10.1093/gigascience/giab008" + licence: ["MIT", "GPL-3.0-or-later"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - VCF: + type: file + description: VCF file (optionally GZIPPED) + pattern: "*.{vcf,vcf.gz}" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - version: + type: file + description: File containing software version + pattern: "versions.yml" + - csi: + type: file + description: Default VCF file index file + pattern: "*.csi" + - tbi: + type: file + description: Alternative VCF file index file for larger files (activated with -t parameter) + pattern: "*.tbi" + +authors: + - "@jfy133" diff --git a/modules/nf-core/modules/bcftools/view/main.nf b/modules/nf-core/modules/bcftools/view/main.nf new file mode 100644 index 00000000..2a240f4a --- /dev/null +++ b/modules/nf-core/modules/bcftools/view/main.nf @@ -0,0 +1,44 @@ +process BCFTOOLS_VIEW { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? 'bioconda::bcftools=1.14' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/bcftools:1.14--h88f3f91_0' : + 'quay.io/biocontainers/bcftools:1.14--h88f3f91_0' }" + + input: + tuple val(meta), path(vcf), path(index) + path(regions) + path(targets) + path(samples) + + output: + tuple val(meta), path("*.gz") , emit: vcf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def regions_file = regions ? "--regions-file ${regions}" : "" + def targets_file = targets ? "--targets-file ${targets}" : "" + def samples_file = samples ? "--samples-file ${samples}" : "" + """ + bcftools view \\ + --output ${prefix}.vcf.gz \\ + ${regions_file} \\ + ${targets_file} \\ + ${samples_file} \\ + $args \\ + --threads $task.cpus \\ + ${vcf} + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + bcftools: \$(bcftools --version 2>&1 | head -n1 | sed 's/^.*bcftools //; s/ .*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/bcftools/view/meta.yml b/modules/nf-core/modules/bcftools/view/meta.yml new file mode 100644 index 00000000..326fd1fa --- /dev/null +++ b/modules/nf-core/modules/bcftools/view/meta.yml @@ -0,0 +1,63 @@ +name: bcftools_view +description: View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF +keywords: + - variant calling + - view + - bcftools + - VCF + +tools: + - view: + description: | + View, subset and filter VCF or BCF files by position and filtering expression. Convert between VCF and BCF + homepage: http://samtools.github.io/bcftools/bcftools.html + documentation: http://www.htslib.org/doc/bcftools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: | + The vcf file to be inspected. + e.g. 'file.vcf' + - index: + type: file + description: | + The tab index for the VCF file to be inspected. + e.g. 'file.tbi' + - regions: + type: file + description: | + Optionally, restrict the operation to regions listed in this file. + e.g. 'file.vcf' + - targets: + type: file + description: | + Optionally, restrict the operation to regions listed in this file (doesn't rely upon index files) + e.g. 'file.vcf' + - samples: + type: file + description: | + Optional, file of sample names to be included or excluded. + e.g. 'file.tsv' +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - vcf: + type: file + description: VCF normalized output file + pattern: "*.{vcf.gz}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@abhi18av" diff --git a/modules/nf-core/modules/custom/dumpsoftwareversions/main.nf b/modules/nf-core/modules/custom/dumpsoftwareversions/main.nf new file mode 100644 index 00000000..327d5100 --- /dev/null +++ b/modules/nf-core/modules/custom/dumpsoftwareversions/main.nf @@ -0,0 +1,24 @@ +process CUSTOM_DUMPSOFTWAREVERSIONS { + label 'process_low' + + // Requires `pyyaml` which does not have a dedicated container but is in the MultiQC container + conda (params.enable_conda ? "bioconda::multiqc=1.11" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/multiqc:1.11--pyhdfd78af_0' : + 'quay.io/biocontainers/multiqc:1.11--pyhdfd78af_0' }" + + input: + path versions + + output: + path "software_versions.yml" , emit: yml + path "software_versions_mqc.yml", emit: mqc_yml + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + template 'dumpsoftwareversions.py' +} diff --git a/modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml b/modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml new file mode 100644 index 00000000..60b546a0 --- /dev/null +++ b/modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml @@ -0,0 +1,34 @@ +name: custom_dumpsoftwareversions +description: Custom module used to dump software versions within the nf-core pipeline template +keywords: + - custom + - version +tools: + - custom: + description: Custom module used to dump software versions within the nf-core pipeline template + homepage: https://github.com/nf-core/tools + documentation: https://github.com/nf-core/tools + licence: ["MIT"] +input: + - versions: + type: file + description: YML file containing software versions + pattern: "*.yml" + +output: + - yml: + type: file + description: Standard YML file containing software versions + pattern: "software_versions.yml" + - mqc_yml: + type: file + description: MultiQC custom content YML file containing software versions + pattern: "software_versions_mqc.yml" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + +authors: + - "@drpatelh" + - "@grst" diff --git a/modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py b/modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py new file mode 100644 index 00000000..d1390392 --- /dev/null +++ b/modules/nf-core/modules/custom/dumpsoftwareversions/templates/dumpsoftwareversions.py @@ -0,0 +1,89 @@ +#!/usr/bin/env python + +import yaml +import platform +from textwrap import dedent + + +def _make_versions_html(versions): + html = [ + dedent( + """\\ + + + + + + + + + + """ + ) + ] + for process, tmp_versions in sorted(versions.items()): + html.append("") + for i, (tool, version) in enumerate(sorted(tmp_versions.items())): + html.append( + dedent( + f"""\\ + + + + + + """ + ) + ) + html.append("") + html.append("
Process Name Software Version
{process if (i == 0) else ''}{tool}{version}
") + return "\\n".join(html) + + +versions_this_module = {} +versions_this_module["${task.process}"] = { + "python": platform.python_version(), + "yaml": yaml.__version__, +} + +with open("$versions") as f: + versions_by_process = yaml.load(f, Loader=yaml.BaseLoader) | versions_this_module + +# aggregate versions by the module name (derived from fully-qualified process name) +versions_by_module = {} +for process, process_versions in versions_by_process.items(): + module = process.split(":")[-1] + try: + assert versions_by_module[module] == process_versions, ( + "We assume that software versions are the same between all modules. " + "If you see this error-message it means you discovered an edge-case " + "and should open an issue in nf-core/tools. " + ) + except KeyError: + versions_by_module[module] = process_versions + +versions_by_module["Workflow"] = { + "Nextflow": "$workflow.nextflow.version", + "$workflow.manifest.name": "$workflow.manifest.version", +} + +versions_mqc = { + "id": "software_versions", + "section_name": "${workflow.manifest.name} Software Versions", + "section_href": "https://github.com/${workflow.manifest.name}", + "plot_type": "html", + "description": "are collected at run time from the software output.", + "data": _make_versions_html(versions_by_module), +} + +with open("software_versions.yml", "w") as f: + yaml.dump(versions_by_module, f, default_flow_style=False) +with open("software_versions_mqc.yml", "w") as f: + yaml.dump(versions_mqc, f, default_flow_style=False) + +with open("versions.yml", "w") as f: + yaml.dump(versions_this_module, f, default_flow_style=False) diff --git a/modules/nf-core/modules/dastool/dastool/main.nf b/modules/nf-core/modules/dastool/dastool/main.nf new file mode 100644 index 00000000..a7d9c6f6 --- /dev/null +++ b/modules/nf-core/modules/dastool/dastool/main.nf @@ -0,0 +1,62 @@ +process DASTOOL_DASTOOL { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::das_tool=1.1.4" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/das_tool:1.1.4--r41hdfd78af_1' : + 'quay.io/biocontainers/das_tool:1.1.4--r41hdfd78af_1' }" + + input: + tuple val(meta), path(contigs), path(bins) + path(proteins) + path(db_directory) + + output: + tuple val(meta), path("*.log") , emit: log + tuple val(meta), path("*_summary.tsv") , optional: true, emit: summary + tuple val(meta), path("*_DASTool_contig2bin.tsv") , optional: true, emit: contig2bin + tuple val(meta), path("*.eval") , optional: true, emit: eval + tuple val(meta), path("*_DASTool_bins/*.fa") , optional: true, emit: bins + tuple val(meta), path("*.pdf") , optional: true, emit: pdfs + tuple val(meta), path("*.candidates.faa") , optional: true, emit: fasta_proteins + tuple val(meta), path("*.faa") , optional: true, emit: candidates_faa + tuple val(meta), path("*.archaea.scg") , optional: true, emit: fasta_archaea_scg + tuple val(meta), path("*.bacteria.scg") , optional: true, emit: fasta_bacteria_scg + tuple val(meta), path("*.b6") , optional: true, emit: b6 + tuple val(meta), path("*.seqlength") , optional: true, emit: seqlength + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def bin_list = bins instanceof List ? bins.join(",") : "$bins" + def db_dir = db_directory ? "--db_directory $db_directory" : "" + def clean_contigs = contigs.toString() - ".gz" + def decompress_contigs = contigs.toString() == clean_contigs ? "" : "gunzip -q -f $contigs" + def clean_proteins = proteins ? proteins.toString() - ".gz" : "" + def decompress_proteins = proteins ? "gunzip -f $proteins" : "" + def proteins_pred = proteins ? "-p $clean_proteins" : "" + + """ + $decompress_proteins + $decompress_contigs + + DAS_Tool \\ + $args \\ + $proteins_pred \\ + $db_dir \\ + -t $task.cpus \\ + -i $bin_list \\ + -c $clean_contigs \\ + -o $prefix + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + dastool: \$( DAS_Tool --version 2>&1 | grep "DAS Tool" | sed 's/DAS Tool //' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/dastool/dastool/meta.yml b/modules/nf-core/modules/dastool/dastool/meta.yml new file mode 100644 index 00000000..0889ca47 --- /dev/null +++ b/modules/nf-core/modules/dastool/dastool/meta.yml @@ -0,0 +1,103 @@ +name: dastool_dastool +description: DAS Tool binning step. +keywords: + - binning + - das tool + - table + - de novo + - bins + - contigs + - assembly + - das_tool +tools: + - dastool: + description: | + DAS Tool is an automated method that integrates the results + of a flexible number of binning algorithms to calculate an optimized, non-redundant + set of bins from a single assembly. + + homepage: https://github.com/cmks/DAS_Tool + documentation: https://github.com/cmks/DAS_Tool + tool_dev_url: https://github.com/cmks/DAS_Tool + doi: "10.1038/s41564-018-0171-1" + licence: ["BSD"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - contigs: + type: file + description: fasta file + pattern: "*.{fa.gz,fas.gz,fasta.gz}" + - bins: + type: file + description: "FastaToContig2Bin tabular file generated with dastool/fastatocontig2bin" + pattern: "*.tsv" + - proteins: + type: file + description: Predicted proteins in prodigal fasta format (>scaffoldID_geneNo) + pattern: "*.{fa.gz,fas.gz,fasta.gz}" + - db_directory: + type: file + description: (optional) Directory of single copy gene database. + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - version: + type: file + description: File containing software version + pattern: "versions.yml" + - log: + type: file + description: Log file of the run + pattern: "*.log" + - summary: + type: file + description: Summary of output bins including quality and completeness estimates + pattern: "*summary.txt" + - contig2bin: + type: file + description: Scaffolds to bin file of output bins + pattern: "*.contig2bin.txt" + - eval: + type: file + description: Quality and completeness estimates of input bin sets + pattern: "*.eval" + - bins: + description: Final refined bins in fasta format + pattern: "*.fa" + - pdfs: + type: file + description: Plots showing the amount of high quality bins and score distribution of bins per method + pattern: "*.pdf" + - fasta_proteins: + type: file + description: Output from prodigal if not already supplied + pattern: "*.proteins.faa" + - fasta_archaea_scg: + type: file + description: Results of archaeal single-copy-gene prediction + pattern: "*.archaea.scg" + - fasta_bacteria_scg: + type: file + description: Results of bacterial single-copy-gene prediction + pattern: "*.bacteria.scg" + - b6: + type: file + description: Results in b6 format + pattern: "*.b6" + - seqlength: + type: file + description: Summary of contig lengths + pattern: "*.seqlength" + +authors: + - "@maxibor" + - "@jfy133" diff --git a/modules/nf-core/modules/dastool/fastatocontig2bin/main.nf b/modules/nf-core/modules/dastool/fastatocontig2bin/main.nf new file mode 100644 index 00000000..8bb13380 --- /dev/null +++ b/modules/nf-core/modules/dastool/fastatocontig2bin/main.nf @@ -0,0 +1,41 @@ +process DASTOOL_FASTATOCONTIG2BIN { + tag "$meta.id" + label 'process_low' + + conda (params.enable_conda ? "bioconda::das_tool=1.1.4" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/das_tool:1.1.4--r41hdfd78af_1' : + 'quay.io/biocontainers/das_tool:1.1.4--r41hdfd78af_1' }" + + input: + tuple val(meta), path(fasta) + val(extension) + + output: + tuple val(meta), path("*.tsv"), emit: fastatocontig2bin + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def file_extension = extension ? extension : "fasta" + def clean_fasta = fasta.toString() - ".gz" + def decompress_fasta = fasta.toString() == clean_fasta ? "" : "gunzip -q -f $fasta" + """ + $decompress_fasta + + Fasta_to_Contig2Bin.sh \\ + $args \\ + -i . \\ + -e $file_extension \\ + > ${prefix}.tsv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + dastool: \$( DAS_Tool --version 2>&1 | grep "DAS Tool" | sed 's/DAS Tool //' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/dastool/fastatocontig2bin/meta.yml b/modules/nf-core/modules/dastool/fastatocontig2bin/meta.yml new file mode 100644 index 00000000..1176ae96 --- /dev/null +++ b/modules/nf-core/modules/dastool/fastatocontig2bin/meta.yml @@ -0,0 +1,56 @@ +name: dastool_fastatocontig2bin +description: Helper script to convert a set of bins in fasta format to tabular scaffolds2bin format +keywords: + - binning + - das tool + - table + - de novo + - bins + - contigs + - assembly + - das_tool +tools: + - dastool: + description: | + DAS Tool is an automated method that integrates the results + of a flexible number of binning algorithms to calculate an optimized, non-redundant + set of bins from a single assembly. + + homepage: https://github.com/cmks/DAS_Tool + documentation: https://github.com/cmks/DAS_Tool + tool_dev_url: https://github.com/cmks/DAS_Tool + doi: "10.1038/s41564-018-0171-1" + licence: ["BSD"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Fasta of list of fasta files recommended to be gathered via with .collect() of bins + pattern: "*.{fa,fa.gz,fas,fas.gz,fna,fna.gz,fasta,fasta.gz}" + - extension: + type: val + description: Fasta file extension (fa | fas | fasta | ...), without .gz suffix, if gzipped input. + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fastatocontig2bin: + type: file + description: tabular contig2bin file for DAS tool input + pattern: "*.tsv" + +authors: + - "@maxibor" + - "@jfy133" diff --git a/modules/nf-core/modules/fastp/functions.nf b/modules/nf-core/modules/fastp/functions.nf deleted file mode 100644 index da9da093..00000000 --- a/modules/nf-core/modules/fastp/functions.nf +++ /dev/null @@ -1,68 +0,0 @@ -// -// Utility functions used in nf-core DSL2 module files -// - -// -// Extract name of software tool from process name using $task.process -// -def getSoftwareName(task_process) { - return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() -} - -// -// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules -// -def initOptions(Map args) { - def Map options = [:] - options.args = args.args ?: '' - options.args2 = args.args2 ?: '' - options.args3 = args.args3 ?: '' - options.publish_by_meta = args.publish_by_meta ?: [] - options.publish_dir = args.publish_dir ?: '' - options.publish_files = args.publish_files - options.suffix = args.suffix ?: '' - return options -} - -// -// Tidy up and join elements of a list to return a path string -// -def getPathFromList(path_list) { - def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries - paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes - return paths.join('/') -} - -// -// Function to save/publish module results -// -def saveFiles(Map args) { - if (!args.filename.endsWith('.version.txt')) { - def ioptions = initOptions(args.options) - def path_list = [ ioptions.publish_dir ?: args.publish_dir ] - if (ioptions.publish_by_meta) { - def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta - for (key in key_list) { - if (args.meta && key instanceof String) { - def path = key - if (args.meta.containsKey(key)) { - path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] - } - path = path instanceof String ? path : '' - path_list.add(path) - } - } - } - if (ioptions.publish_files instanceof Map) { - for (ext in ioptions.publish_files) { - if (args.filename.endsWith(ext.key)) { - def ext_list = path_list.collect() - ext_list.add(ext.value) - return "${getPathFromList(ext_list)}/$args.filename" - } - } - } else if (ioptions.publish_files == null) { - return "${getPathFromList(path_list)}/$args.filename" - } - } -} diff --git a/modules/nf-core/modules/fastp/main.nf b/modules/nf-core/modules/fastp/main.nf index 6d703615..5c9e3b83 100644 --- a/modules/nf-core/modules/fastp/main.nf +++ b/modules/nf-core/modules/fastp/main.nf @@ -1,40 +1,35 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process FASTP { tag "$meta.id" label 'process_medium' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? 'bioconda::fastp=0.20.1' : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container 'https://depot.galaxyproject.org/singularity/fastp:0.20.1--h8b12597_0' - } else { - container 'quay.io/biocontainers/fastp:0.20.1--h8b12597_0' - } + conda (params.enable_conda ? 'bioconda::fastp=0.23.2' : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/fastp:0.23.2--h79da9fb_0' : + 'quay.io/biocontainers/fastp:0.23.2--h79da9fb_0' }" input: tuple val(meta), path(reads) + val save_trimmed_fail + val save_merged output: - tuple val(meta), path('*.trim.fastq.gz'), emit: reads - tuple val(meta), path('*.json') , emit: json - tuple val(meta), path('*.html') , emit: html - tuple val(meta), path('*.log') , emit: log - path '*.version.txt' , emit: version - tuple val(meta), path('*.fail.fastq.gz'), optional:true, emit: reads_fail + tuple val(meta), path('*.trim.fastq.gz') , optional:true, emit: reads + tuple val(meta), path('*.json') , emit: json + tuple val(meta), path('*.html') , emit: html + tuple val(meta), path('*.log') , emit: log + path "versions.yml" , emit: versions + tuple val(meta), path('*.fail.fastq.gz') , optional:true, emit: reads_fail + tuple val(meta), path('*.merged.fastq.gz'), optional:true, emit: reads_merged + + when: + task.ext.when == null || task.ext.when script: + def args = task.ext.args ?: '' // Added soft-links to original fastqs for consistent naming in MultiQC - def software = getSoftwareName(task.process) - def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def prefix = task.ext.prefix ?: "${meta.id}" if (meta.single_end) { - def fail_fastq = params.save_trimmed_fail ? "--failed_out ${prefix}.fail.fastq.gz" : '' + def fail_fastq = save_trimmed_fail ? "--failed_out ${prefix}.fail.fastq.gz" : '' """ [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz fastp \\ @@ -44,12 +39,16 @@ process FASTP { --json ${prefix}.fastp.json \\ --html ${prefix}.fastp.html \\ $fail_fastq \\ - $options.args \\ + $args \\ 2> ${prefix}.fastp.log - echo \$(fastp --version 2>&1) | sed -e "s/fastp //g" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g") + END_VERSIONS """ } else { - def fail_fastq = params.save_trimmed_fail ? "--unpaired1 ${prefix}_1.fail.fastq.gz --unpaired2 ${prefix}_2.fail.fastq.gz" : '' + def fail_fastq = save_trimmed_fail ? "--unpaired1 ${prefix}_1.fail.fastq.gz --unpaired2 ${prefix}_2.fail.fastq.gz" : '' + def merge_fastq = save_merged ? "-m --merged_out ${prefix}.merged.fastq.gz" : '' """ [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz @@ -61,12 +60,16 @@ process FASTP { --json ${prefix}.fastp.json \\ --html ${prefix}.fastp.html \\ $fail_fastq \\ + $merge_fastq \\ --thread $task.cpus \\ --detect_adapter_for_pe \\ - $options.args \\ + $args \\ 2> ${prefix}.fastp.log - echo \$(fastp --version 2>&1) | sed -e "s/fastp //g" > ${software}.version.txt + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g") + END_VERSIONS """ } } diff --git a/modules/nf-core/modules/fastp/meta.yml b/modules/nf-core/modules/fastp/meta.yml index 1fc3dfb6..3274e41b 100644 --- a/modules/nf-core/modules/fastp/meta.yml +++ b/modules/nf-core/modules/fastp/meta.yml @@ -10,6 +10,7 @@ tools: A tool designed to provide fast all-in-one preprocessing for FastQ files. This tool is developed in C++ with multithreading supported to afford high performance. documentation: https://github.com/OpenGene/fastp doi: https://doi.org/10.1093/bioinformatics/bty560 + licence: ["MIT"] input: - meta: type: map @@ -21,6 +22,12 @@ input: description: | List of input FastQ files of size 1 and 2 for single-end and paired-end data, respectively. + - save_trimmed_fail: + type: boolean + description: Specify true to save files that failed to pass trimming thresholds ending in `*.fail.fastq.gz` + - save_merged: + type: boolean + description: Specify true to save all merged reads to the a file ending in `*.merged.fastq.gz` output: - meta: @@ -30,7 +37,7 @@ output: e.g. [ id:'test', single_end:false ] - reads: type: file - description: The trimmed/modified fastq reads + description: The trimmed/modified/unmerged fastq reads pattern: "*trim.fastq.gz" - json: type: file @@ -39,19 +46,23 @@ output: - html: type: file description: Results in HTML format - pattern: "*.thml" + pattern: "*.html" - log: type: file description: fastq log file pattern: "*.log" - - version: + - versions: type: file - description: File containing software version - pattern: "*.{version.txt}" + description: File containing software versions + pattern: "versions.yml" - reads_fail: type: file description: Reads the failed the preprocessing pattern: "*fail.fastq.gz" + - reads_merged: + type: file + description: Reads that were successfully merged + pattern: "*.{merged.fastq.gz}" authors: - "@drpatelh" - "@kevinmenden" diff --git a/modules/nf-core/modules/fastqc/functions.nf b/modules/nf-core/modules/fastqc/functions.nf deleted file mode 100644 index da9da093..00000000 --- a/modules/nf-core/modules/fastqc/functions.nf +++ /dev/null @@ -1,68 +0,0 @@ -// -// Utility functions used in nf-core DSL2 module files -// - -// -// Extract name of software tool from process name using $task.process -// -def getSoftwareName(task_process) { - return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() -} - -// -// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules -// -def initOptions(Map args) { - def Map options = [:] - options.args = args.args ?: '' - options.args2 = args.args2 ?: '' - options.args3 = args.args3 ?: '' - options.publish_by_meta = args.publish_by_meta ?: [] - options.publish_dir = args.publish_dir ?: '' - options.publish_files = args.publish_files - options.suffix = args.suffix ?: '' - return options -} - -// -// Tidy up and join elements of a list to return a path string -// -def getPathFromList(path_list) { - def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries - paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes - return paths.join('/') -} - -// -// Function to save/publish module results -// -def saveFiles(Map args) { - if (!args.filename.endsWith('.version.txt')) { - def ioptions = initOptions(args.options) - def path_list = [ ioptions.publish_dir ?: args.publish_dir ] - if (ioptions.publish_by_meta) { - def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta - for (key in key_list) { - if (args.meta && key instanceof String) { - def path = key - if (args.meta.containsKey(key)) { - path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] - } - path = path instanceof String ? path : '' - path_list.add(path) - } - } - } - if (ioptions.publish_files instanceof Map) { - for (ext in ioptions.publish_files) { - if (args.filename.endsWith(ext.key)) { - def ext_list = path_list.collect() - ext_list.add(ext.value) - return "${getPathFromList(ext_list)}/$args.filename" - } - } - } else if (ioptions.publish_files == null) { - return "${getPathFromList(path_list)}/$args.filename" - } - } -} diff --git a/modules/nf-core/modules/fastqc/main.nf b/modules/nf-core/modules/fastqc/main.nf index 39c327b2..ed6b8c50 100644 --- a/modules/nf-core/modules/fastqc/main.nf +++ b/modules/nf-core/modules/fastqc/main.nf @@ -1,22 +1,11 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process FASTQC { tag "$meta.id" label 'process_medium' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } conda (params.enable_conda ? "bioconda::fastqc=0.11.9" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0" - } else { - container "quay.io/biocontainers/fastqc:0.11.9--0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/fastqc:0.11.9--0' : + 'quay.io/biocontainers/fastqc:0.11.9--0' }" input: tuple val(meta), path(reads) @@ -24,24 +13,35 @@ process FASTQC { output: tuple val(meta), path("*.html"), emit: html tuple val(meta), path("*.zip") , emit: zip - path "*.version.txt" , emit: version + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when script: + def args = task.ext.args ?: '' // Add soft-links to original FastQs for consistent naming in pipeline - def software = getSoftwareName(task.process) - def prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def prefix = task.ext.prefix ?: "${meta.id}" if (meta.single_end) { """ [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz - fastqc $options.args --threads $task.cpus ${prefix}.fastq.gz - fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt + fastqc $args --threads $task.cpus ${prefix}.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS """ } else { """ [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz - fastqc $options.args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz - fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt + fastqc $args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) + END_VERSIONS """ } } diff --git a/modules/nf-core/modules/fastqc/meta.yml b/modules/nf-core/modules/fastqc/meta.yml index 8eb9953d..4da5bb5a 100644 --- a/modules/nf-core/modules/fastqc/meta.yml +++ b/modules/nf-core/modules/fastqc/meta.yml @@ -1,51 +1,52 @@ name: fastqc description: Run FastQC on sequenced reads keywords: - - quality control - - qc - - adapters - - fastq + - quality control + - qc + - adapters + - fastq tools: - - fastqc: - description: | - FastQC gives general quality metrics about your reads. - It provides information about the quality score distribution - across your reads, the per base sequence content (%A/C/G/T). - You get information about adapter contamination and other - overrepresented sequences. - homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ - documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ + - fastqc: + description: | + FastQC gives general quality metrics about your reads. + It provides information about the quality score distribution + across your reads, the per base sequence content (%A/C/G/T). + You get information about adapter contamination and other + overrepresented sequences. + homepage: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ + documentation: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ + licence: ["GPL-2.0-only"] input: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - reads: - type: file - description: | - List of input FastQ files of size 1 and 2 for single-end and paired-end data, - respectively. + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - reads: + type: file + description: | + List of input FastQ files of size 1 and 2 for single-end and paired-end data, + respectively. output: - - meta: - type: map - description: | - Groovy Map containing sample information - e.g. [ id:'test', single_end:false ] - - html: - type: file - description: FastQC report - pattern: "*_{fastqc.html}" - - zip: - type: file - description: FastQC report archive - pattern: "*_{fastqc.zip}" - - version: - type: file - description: File containing software version - pattern: "*.{version.txt}" + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - html: + type: file + description: FastQC report + pattern: "*_{fastqc.html}" + - zip: + type: file + description: FastQC report archive + pattern: "*_{fastqc.zip}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" authors: - - "@drpatelh" - - "@grst" - - "@ewels" - - "@FelixKrueger" + - "@drpatelh" + - "@grst" + - "@ewels" + - "@FelixKrueger" diff --git a/modules/nf-core/modules/freebayes/main.nf b/modules/nf-core/modules/freebayes/main.nf new file mode 100644 index 00000000..73b1da96 --- /dev/null +++ b/modules/nf-core/modules/freebayes/main.nf @@ -0,0 +1,73 @@ +process FREEBAYES { + tag "$meta.id" + label 'process_low' + + conda (params.enable_conda ? "bioconda::freebayes=1.3.5" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/freebayes:1.3.5--py38ha193a2f_3' : + 'quay.io/biocontainers/freebayes:1.3.5--py38ha193a2f_3' }" + + input: + tuple val(meta), path(input_1), path(input_1_index), path(input_2), path(input_2_index), path(target_bed) + path fasta + path fasta_fai + path samples + path populations + path cnv + + output: + tuple val(meta), path("*.vcf.gz"), emit: vcf + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def input = input_2 ? "${input_1} ${input_2}" : "${input_1}" + def targets_file = target_bed ? "--target ${target_bed}" : "" + def samples_file = samples ? "--samples ${samples}" : "" + def populations_file = populations ? "--populations ${populations}" : "" + def cnv_file = cnv ? "--cnv-map ${cnv}" : "" + + if (task.cpus > 1) { + """ + freebayes-parallel \\ + <(fasta_generate_regions.py $fasta_fai 10000) $task.cpus \\ + -f $fasta \\ + $targets_file \\ + $samples_file \\ + $populations_file \\ + $cnv_file \\ + $args \\ + $input > ${prefix}.vcf + + bgzip ${prefix}.vcf + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + freebayes: \$(echo \$(freebayes --version 2>&1) | sed 's/version:\s*v//g' ) + END_VERSIONS + """ + + } else { + """ + freebayes \\ + -f $fasta \\ + $targets_file \\ + $samples_file \\ + $populations_file \\ + $cnv_file \\ + $args \\ + $input > ${prefix}.vcf + + bgzip ${prefix}.vcf + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + freebayes: \$(echo \$(freebayes --version 2>&1) | sed 's/version:\s*v//g' ) + END_VERSIONS + """ + } +} diff --git a/modules/nf-core/modules/freebayes/meta.yml b/modules/nf-core/modules/freebayes/meta.yml new file mode 100644 index 00000000..cbbd297e --- /dev/null +++ b/modules/nf-core/modules/freebayes/meta.yml @@ -0,0 +1,82 @@ +name: freebayes +description: A haplotype-based variant detector +keywords: + - variant caller + - SNP + - genotyping + - somatic variant calling + - germline variant calling + - bacterial variant calling + - bayesian + +tools: + - freebayes: + description: Bayesian haplotype-based polymorphism discovery and genotyping + homepage: https://github.com/freebayes/freebayes + documentation: https://github.com/freebayes/freebayes + tool_dev_url: https://github.com/freebayes/freebayes + doi: "arXiv:1207.3907" + licence: ["MIT"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - input: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - input_index: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai}" + - target_bed: + type: file + description: Optional - Limit analysis to targets listed in this BED-format FILE. + pattern: "*.bed" + - fasta: + type: file + description: reference fasta file + pattern: ".{fa,fa.gz,fasta,fasta.gz}" + - fasta_fai: + type: file + description: reference fasta file index + pattern: "*.{fa,fasta}.fai" + - samples: + type: file + description: Optional - Limit analysis to samples listed (one per line) in the FILE. + pattern: "*.txt" + - populations: + type: file + description: Optional - Each line of FILE should list a sample and a population which it is part of. + pattern: "*.txt" + - cnv: + type: file + description: | + A copy number map BED file, which has either a sample-level ploidy: + sample_name copy_number + or a region-specific format: + seq_name start end sample_name copy_number + pattern: "*.bed" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - version: + type: file + description: File containing software version + pattern: "*.{version.txt}" + - vcf: + type: file + description: Compressed VCF file + pattern: "*.vcf.gz" + +authors: + - "@maxibor" + - "@FriederikeHanssen" + - "@maxulysse" diff --git a/modules/nf-core/modules/gunzip/main.nf b/modules/nf-core/modules/gunzip/main.nf new file mode 100644 index 00000000..61bf1afa --- /dev/null +++ b/modules/nf-core/modules/gunzip/main.nf @@ -0,0 +1,34 @@ +process GUNZIP { + tag "$archive" + label 'process_low' + + conda (params.enable_conda ? "conda-forge::sed=4.7" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/ubuntu:20.04' : + 'ubuntu:20.04' }" + + input: + tuple val(meta), path(archive) + + output: + tuple val(meta), path("$gunzip"), emit: gunzip + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + gunzip = archive.toString() - '.gz' + """ + gunzip \\ + -f \\ + $args \\ + $archive + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/gunzip/meta.yml b/modules/nf-core/modules/gunzip/meta.yml new file mode 100644 index 00000000..4d2ebc84 --- /dev/null +++ b/modules/nf-core/modules/gunzip/meta.yml @@ -0,0 +1,34 @@ +name: gunzip +description: Compresses and decompresses files. +keywords: + - gunzip + - compression +tools: + - gunzip: + description: | + gzip is a file format and a software application used for file compression and decompression. + documentation: https://www.gnu.org/software/gzip/manual/gzip.html + licence: ["GPL-3.0-or-later"] +input: + - meta: + type: map + description: | + Optional groovy Map containing meta information + e.g. [ id:'test', single_end:false ] + - archive: + type: file + description: File to be compressed/uncompressed + pattern: "*.*" +output: + - gunzip: + type: file + description: Compressed/uncompressed file + pattern: "*.*" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@joseespinosa" + - "@drpatelh" + - "@jfy133" diff --git a/modules/nf-core/modules/maxbin2/main.nf b/modules/nf-core/modules/maxbin2/main.nf new file mode 100644 index 00000000..a48df43f --- /dev/null +++ b/modules/nf-core/modules/maxbin2/main.nf @@ -0,0 +1,47 @@ +process MAXBIN2 { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::maxbin2=2.2.7" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/maxbin2:2.2.7--he1b5a44_2' : + 'quay.io/biocontainers/maxbin2:2.2.7--he1b5a44_2' }" + + input: + tuple val(meta), path(contigs), path(reads), path(abund) + + output: + tuple val(meta), path("*.fasta.gz") , emit: binned_fastas + tuple val(meta), path("*.summary") , emit: summary + tuple val(meta), path("*.log.gz") , emit: log + tuple val(meta), path("*.marker.gz") , emit: marker_counts + tuple val(meta), path("*.noclass.gz") , emit: unbinned_fasta + tuple val(meta), path("*.tooshort.gz"), emit: tooshort_fasta + tuple val(meta), path("*_bin.tar.gz") , emit: marker_bins , optional: true + tuple val(meta), path("*_gene.tar.gz"), emit: marker_genes, optional: true + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def associate_files = reads ? "-reads $reads" : "-abund $abund" + """ + mkdir input/ && mv $contigs input/ + run_MaxBin.pl \\ + -contig input/$contigs \\ + $associate_files \\ + -thread $task.cpus \\ + $args \\ + -out $prefix + + gzip *.fasta *.noclass *.tooshort *log *.marker + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + maxbin2: \$( run_MaxBin.pl -v | head -n 1 | sed 's/MaxBin //' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/maxbin2/meta.yml b/modules/nf-core/modules/maxbin2/meta.yml new file mode 100644 index 00000000..7971d481 --- /dev/null +++ b/modules/nf-core/modules/maxbin2/meta.yml @@ -0,0 +1,79 @@ +name: maxbin2 +description: MaxBin is a software that is capable of clustering metagenomic contigs +keywords: + - metagenomics + - assembly + - binning + - maxbin2 + - de novo assembly + - mags + - metagenome-assembled genomes + - contigs +tools: + - maxbin2: + description: MaxBin is software for binning assembled metagenomic sequences based on an Expectation-Maximization algorithm. + homepage: https://sourceforge.net/projects/maxbin/ + documentation: https://sourceforge.net/projects/maxbin/ + tool_dev_url: https://sourceforge.net/projects/maxbin/ + doi: "10.1093/bioinformatics/btv638" + licence: ["BSD 3-clause"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - contigs: + type: file + description: Multi FASTA file containing assembled contigs of a given sample + pattern: "*.fasta" + - reads: + type: file + description: Reads used to assemble contigs in FASTA or FASTQ format. Do not supply at the same time as abundance files. + pattern: "*.fasta" + - abund: + type: file + description: Contig abundance files, i.e. reads against each contig. See MaxBin2 README for details. Do not supply at the same time as read files. + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - binned_fastas: + type: file + description: Binned contigs, one per bin designated with numeric IDs + pattern: "*.fasta.gz" + - summary: + type: file + description: Summary file describing which contigs are being classified into which bin + pattern: "*.summary" + - log: + type: file + description: Log file recording the core steps of MaxBin algorithm + pattern: "*.log.gz" + - marker: + type: file + description: Marker gene presence numbers for each bin + pattern: "*.marker.gz" + - unbinned_fasta: + type: file + description: All sequences that pass the minimum length threshold but are not classified successfully. + pattern: "*.noclass.gz" + - tooshort_fasta: + type: file + description: All sequences that do not meet the minimum length threshold. + pattern: "*.tooshort.gz" + - marker_genes: + type: file + description: All sequences that do not meet the minimum length threshold. + pattern: "*.marker_of_each_gene.tar.gz" + +authors: + - "@jfy133" diff --git a/modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/main.nf b/modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/main.nf new file mode 100644 index 00000000..7125eeb2 --- /dev/null +++ b/modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/main.nf @@ -0,0 +1,38 @@ +process METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::metabat2=2.15" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/metabat2:2.15--h986a166_1' : + 'quay.io/biocontainers/metabat2:2.15--h986a166_1' }" + + input: + tuple val(meta), path(bam), path(bai) + + output: + tuple val(meta), path("*.txt.gz"), emit: depth + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + export OMP_NUM_THREADS=$task.cpus + + jgi_summarize_bam_contig_depths \\ + --outputDepth ${prefix}.txt \\ + $args \\ + $bam + + bgzip --threads $task.cpus ${prefix}.txt + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + metabat2: \$( metabat2 --help 2>&1 | head -n 2 | tail -n 1| sed 's/.*\\:\\([0-9]*\\.[0-9]*\\).*/\\1/' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/meta.yml b/modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/meta.yml new file mode 100644 index 00000000..ff0ab40e --- /dev/null +++ b/modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/meta.yml @@ -0,0 +1,50 @@ +name: metabat2_jgisummarizebamcontigdepths +description: Depth computation per contig step of metabat2 +keywords: + - sort + - binning + - depth + - bam + - coverage + - de novo assembly +tools: + - metabat2: + description: Metagenome binning + homepage: https://bitbucket.org/berkeleylab/metabat/src/master/ + documentation: https://bitbucket.org/berkeleylab/metabat/src/master/ + tool_dev_url: https://bitbucket.org/berkeleylab/metabat/src/master/ + doi: "10.7717/peerj.7359" + licence: ["BSD-3-clause-LBNL"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: Sorted BAM file of reads aligned on the assembled contigs + pattern: "*.bam" + - bai: + type: file + description: BAM index file + pattern: "*.bam.bai" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - depth: + type: file + description: Text file listing the coverage per contig + pattern: ".txt.gz" + +authors: + - "@maxibor" diff --git a/modules/nf-core/modules/metabat2/metabat2/main.nf b/modules/nf-core/modules/metabat2/metabat2/main.nf new file mode 100644 index 00000000..23ebe19a --- /dev/null +++ b/modules/nf-core/modules/metabat2/metabat2/main.nf @@ -0,0 +1,52 @@ +process METABAT2_METABAT2 { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::metabat2=2.15" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/metabat2:2.15--h986a166_1' : + 'quay.io/biocontainers/metabat2:2.15--h986a166_1' }" + + input: + tuple val(meta), path(fasta), path(depth) + + output: + tuple val(meta), path("*.tooShort.fa.gz") , optional:true , emit: tooshort + tuple val(meta), path("*.lowDepth.fa.gz") , optional:true , emit: lowdepth + tuple val(meta), path("*.unbinned.fa.gz") , optional:true , emit: unbinned + tuple val(meta), path("*.tsv.gz") , optional:true , emit: membership + tuple val(meta), path("bins/*.fa.gz") , optional:true , emit: fasta + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + def decompress_depth = depth ? "gzip -d -f $depth" : "" + def depth_file = depth ? "-a ${depth.baseName}" : "" + """ + $decompress_depth + + metabat2 \\ + $args \\ + -i $fasta \\ + $depth_file \\ + -t $task.cpus \\ + --saveCls \\ + -o metabat2/${prefix} + + mv metabat2/${prefix} ${prefix}.tsv + mv metabat2 bins + + gzip ${prefix}.tsv + find ./bins/ -name "*.fa" -type f | xargs -t -n 1 bgzip -@ ${task.cpus} + find ./bins/ -name "*[lowDepth,tooShort,unbinned].fa.gz" -type f -exec mv {} . \\; + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + metabat2: \$( metabat2 --help 2>&1 | head -n 2 | tail -n 1| sed 's/.*\\:\\([0-9]*\\.[0-9]*\\).*/\\1/' ) + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/metabat2/metabat2/meta.yml b/modules/nf-core/modules/metabat2/metabat2/meta.yml new file mode 100644 index 00000000..04b8df4f --- /dev/null +++ b/modules/nf-core/modules/metabat2/metabat2/meta.yml @@ -0,0 +1,68 @@ +name: metabat2_metabat2 +keywords: + - sort + - binning + - depth + - bam + - coverage + - de novo assembly +tools: + - metabat2: + description: Metagenome binning + homepage: https://bitbucket.org/berkeleylab/metabat/src/master/ + documentation: https://bitbucket.org/berkeleylab/metabat/src/master/ + tool_dev_url: https://bitbucket.org/berkeleylab/metabat/src/master/ + doi: "10.7717/peerj.7359" + licence: ["BSD-3-clause-LBNL"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: Fasta file of the assembled contigs + pattern: "*.{fa,fas,fasta,fna,fa.gz,fas.gz,fasta.gz,fna.gz}" + - depth: + type: file + description: | + Optional text file listing the coverage per contig pre-generated + by metabat2_jgisummarizebamcontigdepths + pattern: "*.txt" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - fasta: + type: file + description: Bins created from assembled contigs in fasta file + pattern: "*.fa.gz" + - tooshort: + type: file + description: Contigs that did not pass length filtering + pattern: "*.tooShort.fa.gz" + - lowdepth: + type: file + description: Contigs that did not have sufficient depth for binning + pattern: "*.lowDepth.fa.gz" + - unbinned: + type: file + description: Contigs that pass length and depth filtering but could not be binned + pattern: "*.unbinned.fa.gz" + - membership: + type: file + description: cluster memberships as a matrix format. + pattern: "*.tsv.gz" + +authors: + - "@maxibor" + - "@jfy133" diff --git a/modules/nf-core/modules/prodigal/functions.nf b/modules/nf-core/modules/prodigal/functions.nf deleted file mode 100644 index 85628ee0..00000000 --- a/modules/nf-core/modules/prodigal/functions.nf +++ /dev/null @@ -1,78 +0,0 @@ -// -// Utility functions used in nf-core DSL2 module files -// - -// -// Extract name of software tool from process name using $task.process -// -def getSoftwareName(task_process) { - return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() -} - -// -// Extract name of module from process name using $task.process -// -def getProcessName(task_process) { - return task_process.tokenize(':')[-1] -} - -// -// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules -// -def initOptions(Map args) { - def Map options = [:] - options.args = args.args ?: '' - options.args2 = args.args2 ?: '' - options.args3 = args.args3 ?: '' - options.publish_by_meta = args.publish_by_meta ?: [] - options.publish_dir = args.publish_dir ?: '' - options.publish_files = args.publish_files - options.suffix = args.suffix ?: '' - return options -} - -// -// Tidy up and join elements of a list to return a path string -// -def getPathFromList(path_list) { - def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries - paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes - return paths.join('/') -} - -// -// Function to save/publish module results -// -def saveFiles(Map args) { - def ioptions = initOptions(args.options) - def path_list = [ ioptions.publish_dir ?: args.publish_dir ] - - // Do not publish versions.yml unless running from pytest workflow - if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { - return null - } - if (ioptions.publish_by_meta) { - def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta - for (key in key_list) { - if (args.meta && key instanceof String) { - def path = key - if (args.meta.containsKey(key)) { - path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] - } - path = path instanceof String ? path : '' - path_list.add(path) - } - } - } - if (ioptions.publish_files instanceof Map) { - for (ext in ioptions.publish_files) { - if (args.filename.endsWith(ext.key)) { - def ext_list = path_list.collect() - ext_list.add(ext.value) - return "${getPathFromList(ext_list)}/$args.filename" - } - } - } else if (ioptions.publish_files == null) { - return "${getPathFromList(path_list)}/$args.filename" - } -} diff --git a/modules/nf-core/modules/prodigal/main.nf b/modules/nf-core/modules/prodigal/main.nf index 572ffe92..5768952b 100644 --- a/modules/nf-core/modules/prodigal/main.nf +++ b/modules/nf-core/modules/prodigal/main.nf @@ -1,39 +1,32 @@ -// Import generic module functions -include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process PRODIGAL { tag "$meta.id" label 'process_low' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } - conda (params.enable_conda ? "bioconda::prodigal=2.6.3" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/prodigal:2.6.3--h516909a_2" - } else { - container "quay.io/biocontainers/prodigal:2.6.3--h516909a_2" - } + conda (params.enable_conda ? "prodigal=2.6.3 pigz=2.6" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/mulled-v2-2e442ba7b07bfa102b9cf8fac6221263cd746ab8:57f05cfa73f769d6ed6d54144cb3aa2a6a6b17e0-0' : + 'quay.io/biocontainers/mulled-v2-2e442ba7b07bfa102b9cf8fac6221263cd746ab8:57f05cfa73f769d6ed6d54144cb3aa2a6a6b17e0-0' }" input: tuple val(meta), path(genome) val(output_format) output: - tuple val(meta), path("${prefix}.${output_format}"), emit: gene_annotations - tuple val(meta), path("${prefix}.fna"), emit: nucleotide_fasta - tuple val(meta), path("${prefix}.faa"), emit: amino_acid_fasta - tuple val(meta), path("${prefix}_all.txt"), emit: all_gene_annotations - path "versions.yml" , emit: versions + tuple val(meta), path("${prefix}.${output_format}"), emit: gene_annotations + tuple val(meta), path("${prefix}.fna"), emit: nucleotide_fasta + tuple val(meta), path("${prefix}.faa"), emit: amino_acid_fasta + tuple val(meta), path("${prefix}_all.txt"), emit: all_gene_annotations + path "versions.yml", emit: versions + + when: + task.ext.when == null || task.ext.when script: - prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" """ - prodigal -i "${genome}" \\ - $options.args \\ + pigz -cdf ${genome} | prodigal \\ + $args \\ -f $output_format \\ -d "${prefix}.fna" \\ -o "${prefix}.${output_format}" \\ @@ -41,8 +34,9 @@ process PRODIGAL { -s "${prefix}_all.txt" cat <<-END_VERSIONS > versions.yml - ${getProcessName(task.process)}: - ${getSoftwareName(task.process)}: \$(prodigal -v 2>&1 | sed -n 's/Prodigal V\\(.*\\):.*/\\1/p') + "${task.process}": + prodigal: \$(prodigal -v 2>&1 | sed -n 's/Prodigal V\\(.*\\):.*/\\1/p') + pigz: \$(pigz -V 2>&1 | sed 's/pigz //g') END_VERSIONS """ } diff --git a/modules/nf-core/modules/prodigal/meta.yml b/modules/nf-core/modules/prodigal/meta.yml index 5bcc4e77..8cb3d12e 100644 --- a/modules/nf-core/modules/prodigal/meta.yml +++ b/modules/nf-core/modules/prodigal/meta.yml @@ -5,10 +5,10 @@ keywords: tools: - prodigal: description: Prodigal (Prokaryotic Dynamic Programming Genefinding Algorithm) is a microbial (bacterial and archaeal) gene finding program - homepage: {} - documentation: {} - tool_dev_url: {} - doi: "" + homepage: https://github.com/hyattpd/Prodigal + documentation: https://github.com/hyattpd/prodigal/wiki + tool_dev_url: https://github.com/hyattpd/Prodigal + doi: "10.1186/1471-2105-11-119" licence: ["GPL v3"] input: @@ -17,10 +17,12 @@ input: description: | Groovy Map containing sample information e.g. [ id:'test', single_end:false ] - - bam: + - genome: type: file - description: BAM/CRAM/SAM file - pattern: "*.{bam,cram,sam}" + description: fasta/fasta.gz file + - output_format: + type: string + description: Output format ("gbk"/"gff"/"sqn"/"sco") output: - meta: @@ -32,10 +34,22 @@ output: type: file description: File containing software versions pattern: "versions.yml" - - bam: + - nucleotide_fasta: type: file - description: Sorted BAM/CRAM/SAM file - pattern: "*.{bam,cram,sam}" + description: nucleotide sequences file + pattern: "*.{fna}" + - amino_acid_fasta: + type: file + description: protein translations file + pattern: "*.{faa}" + - all_gene_annotations: + type: file + description: complete starts file + pattern: "*.{_all.txt}" + - gene_annotations: + type: file + description: gene annotations in output_format given as input + pattern: "*.{output_format}" authors: - "@grst" diff --git a/modules/nf-core/modules/prokka/functions.nf b/modules/nf-core/modules/prokka/functions.nf deleted file mode 100644 index 85628ee0..00000000 --- a/modules/nf-core/modules/prokka/functions.nf +++ /dev/null @@ -1,78 +0,0 @@ -// -// Utility functions used in nf-core DSL2 module files -// - -// -// Extract name of software tool from process name using $task.process -// -def getSoftwareName(task_process) { - return task_process.tokenize(':')[-1].tokenize('_')[0].toLowerCase() -} - -// -// Extract name of module from process name using $task.process -// -def getProcessName(task_process) { - return task_process.tokenize(':')[-1] -} - -// -// Function to initialise default values and to generate a Groovy Map of available options for nf-core modules -// -def initOptions(Map args) { - def Map options = [:] - options.args = args.args ?: '' - options.args2 = args.args2 ?: '' - options.args3 = args.args3 ?: '' - options.publish_by_meta = args.publish_by_meta ?: [] - options.publish_dir = args.publish_dir ?: '' - options.publish_files = args.publish_files - options.suffix = args.suffix ?: '' - return options -} - -// -// Tidy up and join elements of a list to return a path string -// -def getPathFromList(path_list) { - def paths = path_list.findAll { item -> !item?.trim().isEmpty() } // Remove empty entries - paths = paths.collect { it.trim().replaceAll("^[/]+|[/]+\$", "") } // Trim whitespace and trailing slashes - return paths.join('/') -} - -// -// Function to save/publish module results -// -def saveFiles(Map args) { - def ioptions = initOptions(args.options) - def path_list = [ ioptions.publish_dir ?: args.publish_dir ] - - // Do not publish versions.yml unless running from pytest workflow - if (args.filename.equals('versions.yml') && !System.getenv("NF_CORE_MODULES_TEST")) { - return null - } - if (ioptions.publish_by_meta) { - def key_list = ioptions.publish_by_meta instanceof List ? ioptions.publish_by_meta : args.publish_by_meta - for (key in key_list) { - if (args.meta && key instanceof String) { - def path = key - if (args.meta.containsKey(key)) { - path = args.meta[key] instanceof Boolean ? "${key}_${args.meta[key]}".toString() : args.meta[key] - } - path = path instanceof String ? path : '' - path_list.add(path) - } - } - } - if (ioptions.publish_files instanceof Map) { - for (ext in ioptions.publish_files) { - if (args.filename.endsWith(ext.key)) { - def ext_list = path_list.collect() - ext_list.add(ext.value) - return "${getPathFromList(ext_list)}/$args.filename" - } - } - } else if (ioptions.publish_files == null) { - return "${getPathFromList(path_list)}/$args.filename" - } -} diff --git a/modules/nf-core/modules/prokka/main.nf b/modules/nf-core/modules/prokka/main.nf index fb86078c..3e46d1a1 100644 --- a/modules/nf-core/modules/prokka/main.nf +++ b/modules/nf-core/modules/prokka/main.nf @@ -1,21 +1,11 @@ -include { initOptions; saveFiles; getSoftwareName; getProcessName } from './functions' - -params.options = [:] -options = initOptions(params.options) - process PROKKA { tag "$meta.id" label 'process_low' - publishDir "${params.outdir}", - mode: params.publish_dir_mode, - saveAs: { filename -> saveFiles(filename:filename, options:params.options, publish_dir:getSoftwareName(task.process), meta:meta, publish_by_meta:['id']) } conda (params.enable_conda ? "bioconda::prokka=1.14.6" : null) - if (workflow.containerEngine == 'singularity' && !params.singularity_pull_docker_container) { - container "https://depot.galaxyproject.org/singularity/prokka:1.14.6--pl526_0" - } else { - container "quay.io/biocontainers/prokka:1.14.6--pl526_0" - } + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/prokka:1.14.6--pl526_0' : + 'quay.io/biocontainers/prokka:1.14.6--pl526_0' }" input: tuple val(meta), path(fasta) @@ -37,13 +27,17 @@ process PROKKA { tuple val(meta), path("${prefix}/*.tsv"), emit: tsv path "versions.yml" , emit: versions + when: + task.ext.when == null || task.ext.when + script: - prefix = options.suffix ? "${meta.id}${options.suffix}" : "${meta.id}" + def args = task.ext.args ?: '' + prefix = task.ext.prefix ?: "${meta.id}" def proteins_opt = proteins ? "--proteins ${proteins[0]}" : "" def prodigal_opt = prodigal_tf ? "--prodigaltf ${prodigal_tf[0]}" : "" """ prokka \\ - $options.args \\ + $args \\ --cpus $task.cpus \\ --prefix $prefix \\ $proteins_opt \\ @@ -51,8 +45,8 @@ process PROKKA { $fasta cat <<-END_VERSIONS > versions.yml - ${getProcessName(task.process)}: - ${getSoftwareName(task.process)}: \$(echo \$(prokka --version 2>&1) | sed 's/^.*prokka //') + "${task.process}": + prokka: \$(echo \$(prokka --version 2>&1) | sed 's/^.*prokka //') END_VERSIONS """ } diff --git a/modules/nf-core/modules/prokka/meta.yml b/modules/nf-core/modules/prokka/meta.yml index 87446694..7fc9e185 100644 --- a/modules/nf-core/modules/prokka/meta.yml +++ b/modules/nf-core/modules/prokka/meta.yml @@ -9,7 +9,7 @@ tools: description: Rapid annotation of prokaryotic genomes homepage: https://github.com/tseemann/prokka doi: "10.1093/bioinformatics/btu153" - licence: ['GPL v2'] + licence: ["GPL v2"] input: - meta: diff --git a/modules/nf-core/modules/pydamage/analyze/main.nf b/modules/nf-core/modules/pydamage/analyze/main.nf new file mode 100644 index 00000000..3463b0e5 --- /dev/null +++ b/modules/nf-core/modules/pydamage/analyze/main.nf @@ -0,0 +1,35 @@ +process PYDAMAGE_ANALYZE { + tag "$meta.id" + label 'process_medium' + + conda (params.enable_conda ? "bioconda::pydamage=0.70" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pydamage:0.70--pyhdfd78af_0' : + 'quay.io/biocontainers/pydamage:0.70--pyhdfd78af_0' }" + + input: + tuple val(meta), path(bam), path(bai) + + output: + tuple val(meta), path("pydamage_results/pydamage_results.csv"), emit: csv + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + pydamage \\ + analyze \\ + $args \\ + -p $task.cpus \\ + $bam + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pydamage: \$(echo \$(pydamage --version 2>&1) | sed -e 's/pydamage, version //g') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/pydamage/analyze/meta.yml b/modules/nf-core/modules/pydamage/analyze/meta.yml new file mode 100644 index 00000000..09dd25eb --- /dev/null +++ b/modules/nf-core/modules/pydamage/analyze/meta.yml @@ -0,0 +1,55 @@ +name: pydamage_analyze +description: Damage parameter estimation for ancient DNA +keywords: + - ancient DNA + - aDNA + - de novo assembly + - filtering + - damage + - deamination + - miscoding lesions + - C to T + - palaeogenomics + - archaeogenomics + - palaeogenetics + - archaeogenetics +tools: + - pydamage: + description: Damage parameter estimation for ancient DNA + homepage: https://github.com/maxibor/pydamage + documentation: https://pydamage.readthedocs.io/ + tool_dev_url: https://github.com/maxibor/pydamage + licence: ["GPL v3"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - bam: + type: file + description: BAM/CRAM/SAM file + pattern: "*.{bam,cram,sam}" + - bai: + type: file + description: BAM/CRAM/SAM index file + pattern: "*.{bai,crai,sai}" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - csv: + type: file + description: PyDamage results as csv files + pattern: "*.csv" + +authors: + - "@maxibor" diff --git a/modules/nf-core/modules/pydamage/filter/main.nf b/modules/nf-core/modules/pydamage/filter/main.nf new file mode 100644 index 00000000..14fbf1c5 --- /dev/null +++ b/modules/nf-core/modules/pydamage/filter/main.nf @@ -0,0 +1,35 @@ +process PYDAMAGE_FILTER { + tag "$meta.id" + label 'process_low' + + conda (params.enable_conda ? "bioconda::pydamage=0.70" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/pydamage:0.70--pyhdfd78af_0' : + 'quay.io/biocontainers/pydamage:0.70--pyhdfd78af_0' }" + + input: + tuple val(meta), path(csv) + + output: + tuple val(meta), path("pydamage_results/pydamage_filtered_results.csv"), emit: csv + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + def prefix = task.ext.prefix ?: "${meta.id}" + """ + + pydamage \\ + filter \\ + $args \\ + $csv + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + pydamage: \$(echo \$(pydamage --version 2>&1) | sed -e 's/pydamage, version //g') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/pydamage/filter/meta.yml b/modules/nf-core/modules/pydamage/filter/meta.yml new file mode 100644 index 00000000..c732ab9b --- /dev/null +++ b/modules/nf-core/modules/pydamage/filter/meta.yml @@ -0,0 +1,51 @@ +name: pydamage_filter +description: Damage parameter estimation for ancient DNA +keywords: + - ancient DNA + - aDNA + - de novo assembly + - filtering + - damage + - deamination + - miscoding lesions + - C to T + - palaeogenomics + - archaeogenomics + - palaeogenetics + - archaeogenetics +tools: + - pydamage: + description: Damage parameter estimation for ancient DNA + homepage: https://github.com/maxibor/pydamage + documentation: https://pydamage.readthedocs.io/ + tool_dev_url: https://github.com/maxibor/pydamage + licence: ["GPL v3"] + +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - csv: + type: file + description: csv file from pydamage analyze + pattern: "*.csv" + +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" + - csv: + type: file + description: PyDamage filtered results as csv file + pattern: "*.csv" + +authors: + - "@maxibor" diff --git a/modules/nf-core/modules/samtools/faidx/main.nf b/modules/nf-core/modules/samtools/faidx/main.nf new file mode 100644 index 00000000..7732a4ec --- /dev/null +++ b/modules/nf-core/modules/samtools/faidx/main.nf @@ -0,0 +1,32 @@ +process SAMTOOLS_FAIDX { + tag "$fasta" + label 'process_low' + + conda (params.enable_conda ? "bioconda::samtools=1.15" : null) + container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? + 'https://depot.galaxyproject.org/singularity/samtools:1.15--h1170115_1' : + 'quay.io/biocontainers/samtools:1.15--h1170115_1' }" + + input: + tuple val(meta), path(fasta) + + output: + tuple val(meta), path ("*.fai"), emit: fai + path "versions.yml" , emit: versions + + when: + task.ext.when == null || task.ext.when + + script: + def args = task.ext.args ?: '' + """ + samtools \\ + faidx \\ + $fasta + + cat <<-END_VERSIONS > versions.yml + "${task.process}": + samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') + END_VERSIONS + """ +} diff --git a/modules/nf-core/modules/samtools/faidx/meta.yml b/modules/nf-core/modules/samtools/faidx/meta.yml new file mode 100644 index 00000000..e9767764 --- /dev/null +++ b/modules/nf-core/modules/samtools/faidx/meta.yml @@ -0,0 +1,43 @@ +name: samtools_faidx +description: Index FASTA file +keywords: + - index + - fasta +tools: + - samtools: + description: | + SAMtools is a set of utilities for interacting with and post-processing + short DNA sequence read alignments in the SAM, BAM and CRAM formats, written by Heng Li. + These files are generated as output by short read aligners like BWA. + homepage: http://www.htslib.org/ + documentation: http://www.htslib.org/doc/samtools.html + doi: 10.1093/bioinformatics/btp352 + licence: ["MIT"] +input: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fasta: + type: file + description: FASTA file + pattern: "*.{fa,fasta}" +output: + - meta: + type: map + description: | + Groovy Map containing sample information + e.g. [ id:'test', single_end:false ] + - fai: + type: file + description: FASTA index file + pattern: "*.{fai}" + - versions: + type: file + description: File containing software versions + pattern: "versions.yml" +authors: + - "@drpatelh" + - "@ewels" + - "@phue" diff --git a/nextflow.config b/nextflow.config index a49cfb7f..43926a05 100644 --- a/nextflow.config +++ b/nextflow.config @@ -1,7 +1,7 @@ /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nf-core/mag Nextflow config file -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Default config options for all compute environments ---------------------------------------------------------------------------------------- */ @@ -10,123 +10,142 @@ params { // Input options - input = null - single_end = false + input = null + single_end = false // short read preprocessing options - save_trimmed_fail = false - fastp_qualified_quality = 15 - fastp_cut_mean_quality = 15 - keep_phix = false - // phix_reference = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Enterobacteria_phage_phiX174_sensu_lato/all_assembly_versions/GCA_002596845.1_ASM259684v1/GCA_002596845.1_ASM259684v1_genomic.fna.gz" - phix_reference = "${baseDir}/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz" - host_fasta = null - host_genome = null - host_removal_verysensitive = false - host_removal_save_ids = false + clip_tool = 'fastp' + reads_minlength = 15 + fastp_save_trimmed_fail = false + fastp_qualified_quality = 15 + fastp_cut_mean_quality = 15 + adapterremoval_minquality = 2 + adapterremoval_adapter1 = 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG' + adapterremoval_adapter2 = 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT' + adapterremoval_trim_quality_stretch = false + keep_phix = false + // phix_reference = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Enterobacteria_phage_phiX174_sensu_lato/all_assembly_versions/GCA_002596845.1_ASM259684v1/GCA_002596845.1_ASM259684v1_genomic.fna.gz" + phix_reference = "${baseDir}/assets/data/GCA_002596845.1_ASM259684v1_genomic.fna.gz" + host_fasta = null + host_genome = null + host_removal_verysensitive = false + host_removal_save_ids = false // binning options - binning_map_mode = 'group' - skip_binning = false - min_contig_size = 1500 - min_length_unbinned_contigs = 1000000 - max_unbinned_contigs = 100 - skip_prokka = false + bowtie2_mode = null + binning_map_mode = 'group' + skip_binning = false + min_contig_size = 1500 + min_length_unbinned_contigs = 1000000 + max_unbinned_contigs = 100 + skip_prokka = false // assembly options - coassemble_group = false - spades_options = '' - megahit_options = '' - skip_spades = false - skip_spadeshybrid = false - skip_megahit = false - skip_quast = false - skip_prodigal = false + coassemble_group = false + spades_options = null + megahit_options = null + skip_spades = false + skip_spadeshybrid = false + skip_megahit = false + skip_quast = false + skip_prodigal = false + + // ancient DNA assembly validation options + ancient_dna = false + freebayes_ploidy = 1 + freebayes_min_basequality = 20 + freebayes_minallelefreq = 0.33 + bcftools_view_high_variant_quality = 30 + bcftools_view_medium_variant_quality = 20 + bcftools_view_minimal_allelesupport = 3 + pydamage_accuracy = 0.5 // taxonomy options - centrifuge_db = null - kraken2_db = null - skip_krona = false - cat_db = null - cat_db_generate = false - save_cat_db = false - gtdb = "https://data.ace.uq.edu.au/public/gtdb/data/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz" - gtdbtk_min_completeness = 50.0 - gtdbtk_max_contamination = 10.0 - gtdbtk_min_perc_aa = 10 - gtdbtk_min_af = 0.65 - gtdbtk_pplacer_cpus = 1 - gtdbtk_pplacer_scratch = true + centrifuge_db = null + kraken2_db = null + skip_krona = false + cat_db = null + cat_db_generate = false + save_cat_db = false + gtdb = "https://data.ace.uq.edu.au/public/gtdb/data/releases/release202/202.0/auxillary_files/gtdbtk_r202_data.tar.gz" + gtdbtk_min_completeness = 50.0 + gtdbtk_max_contamination = 10.0 + gtdbtk_min_perc_aa = 10 + gtdbtk_min_af = 0.65 + gtdbtk_pplacer_cpus = 1 + gtdbtk_pplacer_scratch = true // long read preprocessing options - skip_adapter_trimming = false - keep_lambda = false - longreads_min_length = 1000 - longreads_keep_percent = 90 - longreads_length_weight = 10 - // lambda_reference = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Escherichia_virus_Lambda/all_assembly_versions/GCA_000840245.1_ViralProj14204/GCA_000840245.1_ViralProj14204_genomic.fna.gz" - lambda_reference = "${baseDir}/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz" + skip_adapter_trimming = false + keep_lambda = false + longreads_min_length = 1000 + longreads_keep_percent = 90 + longreads_length_weight = 10 + // lambda_reference = "ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/viral/Escherichia_virus_Lambda/all_assembly_versions/GCA_000840245.1_ViralProj14204/GCA_000840245.1_ViralProj14204_genomic.fna.gz" + lambda_reference = "${baseDir}/assets/data/GCA_000840245.1_ViralProj14204_genomic.fna.gz" + + // binning options + skip_metabat2 = false + skip_maxbin2 = false + refine_bins_dastool = false + refine_bins_dastool_threshold = 0.5 + postbinning_input = 'raw_bins_only' // Bin QC - skip_busco = false - busco_reference = null - busco_download_path = null - busco_auto_lineage_prok = false - save_busco_reference = false + skip_busco = false + busco_reference = null + busco_download_path = null + busco_auto_lineage_prok = false + save_busco_reference = false // Reproducibility options - megahit_fix_cpu_1 = false - spades_fix_cpus = -1 - spadeshybrid_fix_cpus = -1 - metabat_rng_seed = 1 + megahit_fix_cpu_1 = false + spades_fix_cpus = -1 + spadeshybrid_fix_cpus = -1 + metabat_rng_seed = 1 // References - igenomes_base = 's3://ngi-igenomes/igenomes' - igenomes_ignore = false + igenomes_base = 's3://ngi-igenomes/igenomes' + igenomes_ignore = false // MultiQC options - multiqc_config = null - multiqc_title = null - max_multiqc_email_size = '25.MB' + multiqc_config = null + multiqc_title = null + max_multiqc_email_size = '25.MB' // Boilerplate options - outdir = './results' - tracedir = "${params.outdir}/pipeline_info" - publish_dir_mode = 'copy' - email = null - email_on_fail = null - plaintext_email = false - monochrome_logs = false - help = false - validate_params = true - show_hidden_params = false - schema_ignore_params = 'genomes,modules' - enable_conda = false - singularity_pull_docker_container = false + outdir = null + publish_dir_mode = 'copy' + tracedir = "${params.outdir}/pipeline_info" + email = null + email_on_fail = null + plaintext_email = false + monochrome_logs = false + help = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'genomes' + enable_conda = false // Config options - custom_config_version = 'master' - custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" - hostnames = [:] - config_profile_description = null - config_profile_contact = null - config_profile_url = null - config_profile_name = null + custom_config_version = 'master' + custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" + config_profile_description = null + config_profile_contact = null + config_profile_url = null + config_profile_name = null // Max resource options // Defaults only, expecting to be overwritten - max_memory = '128.GB' - max_cpus = 16 - max_time = '240.h' + max_memory = '128.GB' + max_cpus = 16 + max_time = '240.h' } // Load base.config by default for all pipelines includeConfig 'conf/base.config' -// Load modules.config for DSL2 module specific options -includeConfig 'conf/modules.config' - // Load nf-core custom profiles from different Institutions try { includeConfig "${params.custom_config_base}/nfcore_custom.config" @@ -134,13 +153,15 @@ try { System.err.println("WARNING: Could not load nf-core/config profiles: ${params.custom_config_base}/nfcore_custom.config") } -// Load igenomes.config if required -if (!params.igenomes_ignore) { - includeConfig 'conf/igenomes.config' -} else { - params.genomes = [:] +// Load nf-core/mag custom profiles from different institutions. +// Warning: Uncomment only if a pipeline-specific instititutional config already exists on nf-core/configs! +try { + includeConfig "${params.custom_config_base}/pipeline/mag.config" +} catch (Exception e) { + System.err.println("WARNING: Could not load nf-core/config/mag profiles: ${params.custom_config_base}/pipeline/mag.config") } + profiles { debug { process.beforeScript = 'echo $HOSTNAME' } conda { @@ -194,13 +215,28 @@ profiles { test_hybrid_host_rm { includeConfig 'conf/test_hybrid_host_rm.config' } test_busco_auto { includeConfig 'conf/test_busco_auto.config' } test_full { includeConfig 'conf/test_full.config' } + test_ancient_dna { includeConfig 'conf/test_ancient_dna.config' } + test_adapterremoval { includeConfig 'conf/test_adapterremoval.config' } + test_binrefinement { includeConfig 'conf/test_binrefinement.config' } + +} + +// Load igenomes.config if required +if (!params.igenomes_ignore) { + includeConfig 'conf/igenomes.config' +} else { + params.genomes = [:] } // Export these variables to prevent local Python/R libraries from conflicting with those in the container +// The JULIA depot path has been adjusted to a fixed path `/usr/local/share/julia` that needs to be used for packages in the container. +// See https://apeltzer.github.io/post/03-julia-lang-nextflow/ for details on that. Once we have a common agreement on where to keep Julia packages, this is adjustable. + env { PYTHONNOUSERSITE = 1 R_PROFILE_USER = "/.Rprofile" R_ENVIRON_USER = "/.Renviron" + JULIA_DEPOT_PATH = "/usr/local/share/julia" } // Capture exit codes from upstream processes when piping @@ -221,7 +257,7 @@ trace { } dag { enabled = true - file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.html" } manifest { @@ -230,10 +266,13 @@ manifest { homePage = 'https://github.com/nf-core/mag' description = 'Assembly, binning and annotation of metagenomes' mainScript = 'main.nf' - nextflowVersion = '!>=21.04.0' - version = '2.1.1' + nextflowVersion = '!>=21.10.3' + version = '2.2.0' } +// Load modules.config for DSL2 module specific options +includeConfig 'conf/modules.config' + // Function to ensure that resource requirements don't go beyond // a maximum limit def check_max(obj, type) { diff --git a/nextflow_schema.json b/nextflow_schema.json index 2f8ee344..9cdaf1ec 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -10,6 +10,7 @@ "type": "object", "fa_icon": "fas fa-terminal", "description": "Define where the pipeline should find input data and save output data.", + "required": ["input", "outdir"], "properties": { "input": { "type": "string", @@ -26,8 +27,8 @@ }, "outdir": { "type": "string", - "description": "Path to the output directory where the results will be saved.", - "default": "./results", + "format": "directory-path", + "description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.", "fa_icon": "fas fa-folder-open" }, "email": { @@ -89,12 +90,6 @@ "help_text": "If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.", "fa_icon": "fas fa-users-cog" }, - "hostnames": { - "type": "string", - "description": "Institutional configs hostname.", - "hidden": true, - "fa_icon": "fas fa-users-cog" - }, "config_profile_name": { "type": "string", "description": "Institutional config name.", @@ -175,14 +170,7 @@ "description": "Method used to save pipeline results to output directory.", "help_text": "The Nextflow `publishDir` option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See [Nextflow docs](https://www.nextflow.io/docs/latest/process.html#publishdir) for details.", "fa_icon": "fas fa-copy", - "enum": [ - "symlink", - "rellink", - "link", - "copy", - "copyNoFollow", - "move" - ], + "enum": ["symlink", "rellink", "link", "copy", "copyNoFollow", "move"], "hidden": true }, "email_on_fail": { @@ -245,13 +233,6 @@ "description": "Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.", "hidden": true, "fa_icon": "fas fa-bacon" - }, - "singularity_pull_docker_container": { - "type": "boolean", - "description": "Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.", - "hidden": true, - "fa_icon": "fas fa-toolbox", - "help_text": "This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues." } } }, @@ -293,16 +274,21 @@ "description": "", "default": "", "properties": { - "save_trimmed_fail": { - "type": "boolean", - "fa_icon": "fas fa-save", - "description": "Save the by fastp trimmed FastQ files in the results directory.", - "help_text": "By default, trimmed FastQ files will not be saved to the results directory. Specify this flag (or set to true in your config file) to copy these files to the results directory when complete." + "clip_tool": { + "type": "string", + "default": "fastp", + "description": "Specify which adapter clipping tool to use. Options: 'fastp', 'adapterremoval'", + "enum": ["fastp", "adapterremoval"] + }, + "reads_minlength": { + "type": "integer", + "default": 15, + "description": "The minimum length of reads must have to be retained for downstream analysis." }, "fastp_qualified_quality": { "type": "integer", "default": 15, - "description": "Minimum phred quality value of a base to be qualified.", + "description": "Minimum phred quality value of a base to be qualified in fastp.", "help": "Reads with more than 40% of unqualified bases will be discarded." }, "fastp_cut_mean_quality": { @@ -311,6 +297,30 @@ "description": "The mean quality requirement used for per read sliding window cutting by fastp.", "help": "Used in combination with the fastp options '--cut_front' and '--cut_tail'. If the mean quality within a window (of size 4) is below `--fastp_cut_mean_quality`, the bases are dropped and the sliding window is moved further, otherwise it stops." }, + "fastp_save_trimmed_fail": { + "type": "boolean", + "description": "Save reads that fail fastp filtering in a separate file. Not used downstream." + }, + "adapterremoval_minquality": { + "type": "integer", + "default": 2, + "description": "The minimum base quality for low-quality base trimming by AdapterRemoval." + }, + "adapterremoval_trim_quality_stretch": { + "type": "boolean", + "description": "Turn on quality trimming by consecutive stretch of low quality bases, rather than by window.", + "help_text": "Default base-quality trimming is set to trim by 'windows', as in FastP. Specifying this flag will use trim via contiguous stretch of low quality bases (Ns) instead.\n\n> Replaces --trimwindows 4 with --trimqualities in AdapterRemoval" + }, + "adapterremoval_adapter1": { + "type": "string", + "default": "AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG", + "description": "Forward read adapter to be trimmed by AdapterRemoval." + }, + "adapterremoval_adapter2": { + "type": "string", + "default": "AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT", + "description": "Reverse read adapter to be trimmed by AdapterRemoval for paired end data." + }, "host_genome": { "type": "string", "help_text": "This parameter is mutually exclusive with `--host_genome`. Host read removal is done with Bowtie2. \nBoth the iGenomes FASTA file as well as corresponding, already pre-built Bowtie 2 index files will be used.", @@ -528,7 +538,15 @@ }, "skip_binning": { "type": "boolean", - "description": "Skip metagenome binning." + "description": "Skip metagenome binning entirely" + }, + "skip_metabat2": { + "type": "boolean", + "description": "Skip MetaBAT2 Binning" + }, + "skip_maxbin2": { + "type": "boolean", + "description": "Skip MaxBin2 Binning" }, "min_contig_size": { "type": "integer", @@ -548,6 +566,11 @@ "description": "Maximal number of contigs that are not part of any bin but treated as individual genome.", "help_text": "Contigs that do not fulfill the thresholds of `--min_length_unbinned_contigs` and `--max_unbinned_contigs` are pooled for downstream analysis and reporting, except contigs that also do not fullfill `--min_contig_size` are not considered further." }, + "bowtie2_mode": { + "type": "string", + "description": "Bowtie2 alignment mode", + "help_text": "Bowtie2 alignment mode options, for example: `--very-fast` , `--very-sensitive-local -N 1` , ..." + }, "skip_prokka": { "type": "boolean", "description": "Skip Prokka genome annotation." @@ -582,6 +605,70 @@ "type": "boolean", "description": "Save the used BUSCO lineage datasets provided via --busco_reference or downloaded when not using --busco_reference or --busco_download_path.", "help_text": "Useful to allow reproducibility, as BUSCO datasets are frequently updated and old versions do not always remain accessible." + }, + "refine_bins_dastool": { + "type": "boolean", + "description": "Turn on bin refinement using DAS Tool." + }, + "refine_bins_dastool_threshold": { + "type": "number", + "default": 0.5, + "description": "Specify single-copy gene score threshold for bin refinement.", + "help_text": "Score threshold for single-copy gene selection algorithm to keep selecting bins, with a value ranging from 0-1.\n\nFor description of scoring algorithm, see: Sieber, Christian M. K., et al. 2018. Nature Microbiology 3 (7): 836\u201343. https://doi.org/10.1038/s41564-018-0171-1.\n\n> Modifies DAS Tool parameter --score_threshold\n" + }, + "postbinning_input": { + "type": "string", + "default": "raw_bins_only", + "description": "Specify which binning output is sent for downstream annotation, taxonomic classification, bin quality control etc.", + "help_text": "`raw_bins_only`: only bins (and unbinned contigs) from the binners.\n`refined_bins_only`: only bins (and unbinned contigs) from the bin refinement step .\n`both`: bins and unbinned contigs from both the binning and bin refinement steps.", + "enum": ["raw_bins_only", "refined_bins_only", "both"] + } + } + }, + "ancient_dna_assembly": { + "title": "Ancient DNA assembly", + "type": "object", + "description": "Performs ancient DNA assembly validation and contig consensus sequence recalling.", + "default": "", + "properties": { + "ancient_dna": { + "type": "boolean", + "description": "Turn on/off the ancient DNA subworfklow" + }, + "freebayes_ploidy": { + "type": "integer", + "default": 1, + "description": "Ploidy for variant calling" + }, + "freebayes_min_basequality": { + "type": "integer", + "default": 20, + "description": "minimum base quality required for variant calling" + }, + "freebayes_minallelefreq": { + "type": "number", + "default": 0.33, + "description": "minimum minor allele frequency for considering variants" + }, + "bcftools_view_high_variant_quality": { + "type": "integer", + "default": 30, + "description": "minimum genotype quality for considering a variant high quality" + }, + "bcftools_view_medium_variant_quality": { + "type": "integer", + "default": 20, + "description": "minimum genotype quality for considering a variant medium quality" + }, + "bcftools_view_minimal_allelesupport": { + "type": "integer", + "default": 3, + "description": "minimum number of bases supporting the alternative allele" + }, + "pydamage_accuracy": { + "type": "number", + "default": 0.5, + "description": "PyDamage accuracy threshold" } } } @@ -625,6 +712,9 @@ }, { "$ref": "#/definitions/bin_quality_check_options" + }, + { + "$ref": "#/definitions/ancient_dna_assembly" } ] -} \ No newline at end of file +} diff --git a/subworkflows/local/ancient_dna.nf b/subworkflows/local/ancient_dna.nf new file mode 100644 index 00000000..fae88f82 --- /dev/null +++ b/subworkflows/local/ancient_dna.nf @@ -0,0 +1,42 @@ +include { BCFTOOLS_CONSENSUS } from '../../modules/nf-core/modules/bcftools/consensus/main' +include { BCFTOOLS_INDEX as BCFTOOLS_INDEX_PRE ; BCFTOOLS_INDEX as BCFTOOLS_INDEX_POST } from '../../modules/nf-core/modules/bcftools/index/main' +include { BCFTOOLS_VIEW } from '../../modules/nf-core/modules/bcftools/view/main' +include { FREEBAYES } from '../../modules/nf-core/modules/freebayes/main' +include { PYDAMAGE_ANALYZE } from '../../modules/nf-core/modules/pydamage/analyze/main' +include { PYDAMAGE_FILTER } from '../../modules/nf-core/modules/pydamage/filter/main' +include { SAMTOOLS_FAIDX as FAIDX} from '../../modules/nf-core/modules/samtools/faidx/main' + +workflow ANCIENT_DNA_ASSEMLY_VALIDATION { + take: + input //channel: [val(meta), path(contigs), path(bam), path(bam_index)] + main: + PYDAMAGE_ANALYZE(input.map {item -> [item[0], item[2], item[3]]}) + PYDAMAGE_FILTER(PYDAMAGE_ANALYZE.out.csv) + FAIDX(input.map { item -> [ item[0], item[1] ] }) + freebayes_input = input.join(FAIDX.out.fai) // [val(meta), path(contigs), path(bam), path(bam_index), path(fai)] + FREEBAYES (freebayes_input.map { item -> [item[0], item[2], item[3], [], [], []] }, + freebayes_input.map { item -> item[1] }, + freebayes_input.map { item -> item[4] }, + [], + [], + [] ) + + BCFTOOLS_INDEX_PRE(FREEBAYES.out.vcf) + BCFTOOLS_VIEW(FREEBAYES.out.vcf.join(BCFTOOLS_INDEX_PRE.out.tbi), [], [], []) + BCFTOOLS_INDEX_POST(BCFTOOLS_VIEW.out.vcf) + BCFTOOLS_CONSENSUS(BCFTOOLS_VIEW.out.vcf + .join(BCFTOOLS_INDEX_POST.out.tbi) + .join(input.map { item -> [ item[0], item[1] ] })) + + ch_versions = Channel.empty() + ch_versions = PYDAMAGE_ANALYZE.out.versions.first() + ch_versions = ch_versions.mix(FAIDX.out.versions.first()) + ch_versions = ch_versions.mix(FREEBAYES.out.versions.first()) + ch_versions = ch_versions.mix(BCFTOOLS_CONSENSUS.out.versions.first()) + emit: + contigs_recalled = BCFTOOLS_CONSENSUS.out.fasta // channel: [ val(meta), path(fasta) ] + pydamage_results = PYDAMAGE_ANALYZE.out.csv // channel: [ val(meta), path(csv) ] + pydamage_filtered_results = PYDAMAGE_FILTER.out.csv // channel: [ val(meta), path(csv) ] + versions = ch_versions // channel: [ versions.yml ] +} + diff --git a/subworkflows/local/binning.nf b/subworkflows/local/binning.nf new file mode 100644 index 00000000..0c7f4c28 --- /dev/null +++ b/subworkflows/local/binning.nf @@ -0,0 +1,180 @@ +/* + * Binning with MetaBAT2 and MaxBin2 + */ + +params.mag_depths_options = [:] +params.mag_depths_plot_options = [:] +params.mag_depths_summary_options = [:] + +include { METABAT2_METABAT2 } from '../../modules/nf-core/modules/metabat2/metabat2/main' +include { METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS } from '../../modules/nf-core/modules/metabat2/jgisummarizebamcontigdepths/main' +include { MAXBIN2 } from '../../modules/nf-core/modules/maxbin2/main' +include { GUNZIP as GUNZIP_BINS } from '../../modules/nf-core/modules/gunzip/main' +include { GUNZIP as GUNZIP_UNBINS } from '../../modules/nf-core/modules/gunzip/main' + +include { CONVERT_DEPTHS } from '../../modules/local/convert_depths' +include { ADJUST_MAXBIN2_EXT } from '../../modules/local/adjust_maxbin2_ext' +include { SPLIT_FASTA } from '../../modules/local/split_fasta' +include { MAG_DEPTHS } from '../../modules/local/mag_depths' addParams( options: params.mag_depths_options ) +include { MAG_DEPTHS_PLOT } from '../../modules/local/mag_depths_plot' addParams( options: params.mag_depths_plot_options ) +include { MAG_DEPTHS_SUMMARY } from '../../modules/local/mag_depths_summary' addParams( options: params.mag_depths_summary_options ) + +/* + * Get number of columns in file (first line) + */ +def getColNo(filename) { + lines = file(filename).readLines() + return lines[0].split('\t').size() +} + +workflow BINNING { + take: + assemblies // channel: [ val(meta), path(assembly), path(bams), path(bais) ] + reads // channel: [ val(meta), [ reads ] ] + + main: + + ch_versions = Channel.empty() + + // generate coverage depths for each contig + ch_summarizedepth_input = assemblies + .map { meta, assembly, bams, bais -> + def meta_new = meta.clone() + [ meta_new, bams, bais ] + } + + METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS ( ch_summarizedepth_input ) + + ch_metabat_depths = METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS.out.depth + .map { meta, depths -> + def meta_new = meta.clone() + meta_new['binner'] = 'MetaBAT2' + + [ meta_new, depths ] + } + + ch_versions = ch_versions.mix(METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS.out.versions.first()) + + // combine depths back with assemblies + ch_metabat2_input = assemblies + .map { meta, assembly, bams, bais -> + def meta_new = meta.clone() + meta_new['binner'] = 'MetaBAT2' + + [ meta_new, assembly, bams, bais ] + } + .join( ch_metabat_depths, by: 0 ) + .map { meta, assembly, bams, bais, depths -> + [ meta, assembly, depths ] + } + + // convert metabat2 depth files to maxbin2 + if ( !params.skip_maxbin2 ) { + CONVERT_DEPTHS ( ch_metabat2_input ) + ch_maxbin2_input = CONVERT_DEPTHS.out.output + .map { meta, assembly, reads, depth -> + def meta_new = meta.clone() + meta_new['binner'] = 'MaxBin2' + + [ meta_new, assembly, reads, depth ] + } + ch_versions = ch_versions.mix(CONVERT_DEPTHS.out.versions.first()) + } + + // main bins for decompressing for MAG_DEPTHS + ch_final_bins_for_gunzip = Channel.empty() + // final gzipped bins + ch_binning_results_gzipped_final = Channel.empty() + // run binning + if ( !params.skip_metabat2 ) { + METABAT2_METABAT2 ( ch_metabat2_input ) + // before decompressing first have to separate and re-group due to limitation of GUNZIP module + ch_final_bins_for_gunzip = ch_final_bins_for_gunzip.mix( METABAT2_METABAT2.out.fasta.transpose() ) + ch_binning_results_gzipped_final = ch_binning_results_gzipped_final.mix( METABAT2_METABAT2.out.fasta ) + ch_versions = ch_versions.mix(METABAT2_METABAT2.out.versions.first()) + } + if ( !params.skip_maxbin2 ) { + MAXBIN2 ( ch_maxbin2_input ) + ADJUST_MAXBIN2_EXT ( MAXBIN2.out.binned_fastas ) + ch_final_bins_for_gunzip = ch_final_bins_for_gunzip.mix( ADJUST_MAXBIN2_EXT.out.renamed_bins.transpose() ) + ch_binning_results_gzipped_final = ch_binning_results_gzipped_final.mix( ADJUST_MAXBIN2_EXT.out.renamed_bins ) + ch_versions = ch_versions.mix(MAXBIN2.out.versions) + } + + // split fastq files, depending + if ( !params.skip_metabat2 & params.skip_maxbin2 ) { + ch_input_splitfasta = METABAT2_METABAT2.out.unbinned + } else if ( params.skip_metabat2 & !params.skip_maxbin2 ) { + ch_input_splitfasta = MAXBIN2.out.unbinned_fasta + } else { + ch_input_splitfasta = METABAT2_METABAT2.out.unbinned.mix(MAXBIN2.out.unbinned_fasta) + } + + SPLIT_FASTA ( ch_input_splitfasta ) + // large unbinned contigs from SPLIT_FASTA for decompressing for MAG_DEPTHS, + // first have to separate and re-group due to limitation of GUNZIP module + ch_split_fasta_results_transposed = SPLIT_FASTA.out.unbinned.transpose() + ch_versions = ch_versions.mix(SPLIT_FASTA.out.versions) + + GUNZIP_BINS ( ch_final_bins_for_gunzip ) + ch_binning_results_gunzipped = GUNZIP_BINS.out.gunzip + ch_versions = ch_versions.mix(GUNZIP_BINS.out.versions.first()) + + GUNZIP_UNBINS ( ch_split_fasta_results_transposed ) + ch_splitfasta_results_gunzipped = GUNZIP_UNBINS.out.gunzip + ch_versions = ch_versions.mix(GUNZIP_UNBINS.out.versions.first()) + + // Compute bin depths for different samples (according to `binning_map_mode`) + // Have to remove binner meta before joining with according depths files, + // as required for MAG_DEPTHS, but we can add 'binner' + // info again based on file name and finally group by + // 'assembler', 'id', 'binner' + ch_depth_input = ch_binning_results_gunzipped + .mix(ch_splitfasta_results_gunzipped ) + .map { meta, bin -> + def meta_new = meta.clone() + meta_new.remove('binner') + [ meta_new, bin ] + } + .groupTuple (by: 0 ) + .join( METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS.out.depth, by: 0 ) + .transpose() + .map { meta, bin, contig_depths_file -> + def meta_new = meta.clone() + meta_new['binner'] = bin.name.split("-")[1] + [ meta_new, bin, contig_depths_file ] + } + .groupTuple (by: [0,2] ) + + MAG_DEPTHS ( ch_depth_input ) + ch_versions = ch_versions.mix(MAG_DEPTHS.out.versions) + + // Plot bin depths heatmap for each assembly and mapped samples (according to `binning_map_mode`) + // create file containing group information for all samples + ch_sample_groups = reads + .collectFile(name:'sample_groups.tsv'){ meta, reads -> meta.id + '\t' + meta.group + '\n' } + + // Filter MAG depth files: use only those for plotting that contain depths for > 2 samples + ch_mag_depths_plot = MAG_DEPTHS.out.depths + .map { meta, bin_depths_file -> + if (getColNo(bin_depths_file) > 2) [ meta, bin_depths_file ] + } + + MAG_DEPTHS_PLOT ( ch_mag_depths_plot, ch_sample_groups.collect() ) + MAG_DEPTHS_SUMMARY ( MAG_DEPTHS.out.depths.map{it[1]}.collect() ) + ch_versions = ch_versions.mix( MAG_DEPTHS_PLOT.out.versions ) + ch_versions = ch_versions.mix( MAG_DEPTHS_SUMMARY.out.versions ) + + // Group final binned contigs per sample for final output + ch_binning_results_gunzipped_final = ch_binning_results_gunzipped.groupTuple(by: 0) + ch_binning_results_gzipped_final = ch_binning_results_gzipped_final.groupTuple(by: 0) + + emit: + bins = ch_binning_results_gunzipped_final + bins_gz = ch_binning_results_gzipped_final + unbinned = ch_splitfasta_results_gunzipped.groupTuple() + unbinned_gz = SPLIT_FASTA.out.unbinned + depths_summary = MAG_DEPTHS_SUMMARY.out.summary + metabat2depths = METABAT2_JGISUMMARIZEBAMCONTIGDEPTHS.out.depth + versions = ch_versions +} diff --git a/subworkflows/local/metabat2_binning.nf b/subworkflows/local/binning_preparation.nf similarity index 52% rename from subworkflows/local/metabat2_binning.nf rename to subworkflows/local/binning_preparation.nf index 7e1d03e9..95e24da3 100644 --- a/subworkflows/local/metabat2_binning.nf +++ b/subworkflows/local/binning_preparation.nf @@ -1,30 +1,14 @@ /* - * Binning with MetaBAT2 + * Binning preparation with Bowtie2 */ params.bowtie2_build_options = [:] params.bowtie2_align_options = [:] -params.metabat2_options = [:] -params.mag_depths_options = [:] -params.mag_depths_plot_options = [:] -params.mag_depths_summary_options = [:] include { BOWTIE2_ASSEMBLY_BUILD } from '../../modules/local/bowtie2_assembly_build' addParams( options: params.bowtie2_build_options ) include { BOWTIE2_ASSEMBLY_ALIGN } from '../../modules/local/bowtie2_assembly_align' addParams( options: params.bowtie2_align_options ) -include { METABAT2 } from '../../modules/local/metabat2' addParams( options: params.metabat2_options ) -include { MAG_DEPTHS } from '../../modules/local/mag_depths' addParams( options: params.mag_depths_options ) -include { MAG_DEPTHS_PLOT } from '../../modules/local/mag_depths_plot' addParams( options: params.mag_depths_plot_options ) -include { MAG_DEPTHS_SUMMARY } from '../../modules/local/mag_depths_summary' addParams( options: params.mag_depths_summary_options ) -/* - * Get number of columns in file (first line) - */ -def getColNo(filename) { - lines = file(filename).readLines() - return lines[0].split('\t').size() -} - -workflow METABAT2_BINNING { +workflow BINNING_PREPARATION { take: assemblies // channel: [ val(meta), path(assembly) ] reads // channel: [ val(meta), [ reads ] ] @@ -58,33 +42,10 @@ workflow METABAT2_BINNING { // group mappings for one assembly ch_grouped_mappings = BOWTIE2_ASSEMBLY_ALIGN.out.mappings .groupTuple(by: 0) - .map { meta, assembly, bams, bais -> [ meta, assembly[0], bams, bais ] } // multiple symlinks to the same assembly -> use first - - METABAT2 ( ch_grouped_mappings ) - - // Compute bin depths for different samples (according to `binning_map_mode`) - MAG_DEPTHS ( - METABAT2.out.bins, - METABAT2.out.depths - ) - // Plot bin depths heatmap for each assembly and mapped samples (according to `binning_map_mode`) - // create file containg group information for all samples - ch_sample_groups = reads - .collectFile(name:'sample_groups.tsv'){ meta, reads -> meta.id + '\t' + meta.group + '\n' } - // filter MAG depth files: use only those for plotting that contain depths for > 2 samples - ch_mag_depths_plot = MAG_DEPTHS.out.depths - .map { meta, depth_file -> if (getColNo(depth_file) > 2) [meta, depth_file] } - MAG_DEPTHS_PLOT ( - ch_mag_depths_plot, - ch_sample_groups.collect() - ) - - MAG_DEPTHS_SUMMARY ( MAG_DEPTHS.out.depths.map{it[1]}.collect() ) + .map { meta, assembly, bams, bais -> [ meta, assembly.sort()[0], bams, bais ] } // multiple symlinks to the same assembly -> use first of sorted list emit: bowtie2_assembly_multiqc = BOWTIE2_ASSEMBLY_ALIGN.out.log.map { assembly_meta, reads_meta, log -> if (assembly_meta.id == reads_meta.id) {return [ log ]} } - bowtie2_version = BOWTIE2_ASSEMBLY_ALIGN.out.version - bins = METABAT2.out.bins - depths_summary = MAG_DEPTHS_SUMMARY.out.summary - metabat2_version = METABAT2.out.version + bowtie2_version = BOWTIE2_ASSEMBLY_ALIGN.out.versions + grouped_mappings = ch_grouped_mappings } diff --git a/subworkflows/local/binning_refinement.nf b/subworkflows/local/binning_refinement.nf new file mode 100644 index 00000000..84e0e142 --- /dev/null +++ b/subworkflows/local/binning_refinement.nf @@ -0,0 +1,140 @@ +/* + * Binning with MetaBAT2 and MaxBin2 + */ + +include { DASTOOL_FASTATOCONTIG2BIN as DASTOOL_FASTATOCONTIG2BIN_METABAT2 } from '../../modules/nf-core/modules/dastool/fastatocontig2bin/main.nf' +include { DASTOOL_FASTATOCONTIG2BIN as DASTOOL_FASTATOCONTIG2BIN_MAXBIN2 } from '../../modules/nf-core/modules/dastool/fastatocontig2bin/main.nf' +include { DASTOOL_DASTOOL } from '../../modules/nf-core/modules/dastool/dastool/main.nf' +include { RENAME_PREDASTOOL } from '../../modules/local/rename_predastool' +include { RENAME_POSTDASTOOL } from '../../modules/local/rename_postdastool' +include { MAG_DEPTHS as MAG_DEPTHS_REFINED } from '../../modules/local/mag_depths' +include { MAG_DEPTHS_PLOT as MAG_DEPTHS_PLOT_REFINED } from '../../modules/local/mag_depths_plot' +include { MAG_DEPTHS_SUMMARY as MAG_DEPTHS_SUMMARY_REFINED } from '../../modules/local/mag_depths_summary' + +/* + * Get number of columns in file (first line) + */ +def getColNo(filename) { + lines = file(filename).readLines() + return lines[0].split('\t').size() +} + +workflow BINNING_REFINEMENT { + take: + contigs + bins // channel: [ val(meta), path(bins) ] + depths + reads + + main: + ch_versions = Channel.empty() + + // Drop unnecessary files + ch_contigs_for_dastool = contigs + .map { + meta, assembly, bams, bais -> + def meta_new = meta.clone() + [ meta_new, assembly ] + } + + ch_bins_for_fastatocontig2bin = RENAME_PREDASTOOL(bins).renamed_bins + .branch { + metabat2: it[0]['binner'] == 'MetaBAT2' + maxbin2: it[0]['binner'] == 'MaxBin2' + } + + // Generate DASTool auxilary files + DASTOOL_FASTATOCONTIG2BIN_METABAT2 ( ch_bins_for_fastatocontig2bin.metabat2, "fa") + // MaxBin2 bin extension was changed to 'fa' as well in RENAME_PREDASTOOL + DASTOOL_FASTATOCONTIG2BIN_MAXBIN2 ( ch_bins_for_fastatocontig2bin.maxbin2, "fa") + + // Run DASTOOL + ch_fastatocontig2bin_for_dastool = Channel.empty() + ch_fastatocontig2bin_for_dastool = ch_fastatocontig2bin_for_dastool + .mix(DASTOOL_FASTATOCONTIG2BIN_METABAT2.out.fastatocontig2bin) + .mix(DASTOOL_FASTATOCONTIG2BIN_MAXBIN2.out.fastatocontig2bin) + .map { + meta, fastatocontig2bin -> + def meta_new = meta.clone() + meta_new.remove('binner') + [ meta_new, fastatocontig2bin ] + } + .groupTuple(by: 0) + + ch_input_for_dastool = ch_contigs_for_dastool.join(ch_fastatocontig2bin_for_dastool, by: 0) + + ch_versions = ch_versions.mix(DASTOOL_FASTATOCONTIG2BIN_METABAT2.out.versions.first()) + ch_versions = ch_versions.mix(DASTOOL_FASTATOCONTIG2BIN_MAXBIN2.out.versions.first()) + + // Run DAStool + DASTOOL_DASTOOL(ch_input_for_dastool, [], []) + ch_versions = ch_versions.mix(DASTOOL_DASTOOL.out.versions.first()) + + // Prepare bins for downstream analysis (separate from unbins, add 'binner' info and group) + // use DASTool as 'binner' info allowing according grouping of refined bin sets, + // while keeping information about original binning method in filenames and used binnames, e.g. "*-MaxBin2Refined-*.fa" + // (alternatively one could think of adding, for example, meta.orig_binner, if this would simplify code) + ch_dastool_bins_newmeta = DASTOOL_DASTOOL.out.bins.transpose() + .map { + meta, bin -> + if (bin.name != "unbinned.fa") { + def meta_new = meta.clone() + meta_new['binner'] = 'DASTool' + [ meta_new, bin ] + } + } + .groupTuple() + + ch_input_for_renamedastool = DASTOOL_DASTOOL.out.bins + .map { + meta, bins -> + def meta_new = meta.clone() + meta_new['binner'] = 'DASTool' + [ meta_new, bins ] + } + + RENAME_POSTDASTOOL ( ch_input_for_renamedastool ) + + // We have to strip the meta to be able to combine with the original + // depths file to run MAG_DEPTH + ch_input_for_magdepth = ch_dastool_bins_newmeta + .mix( RENAME_POSTDASTOOL.out.refined_unbins ) + .map { + meta, refinedbins -> + def meta_new = meta.clone() + meta_new.remove('binner') + [ meta_new, refinedbins ] + } + .transpose() + .groupTuple (by: 0 ) + .join( depths, by: 0 ) + .map { + meta, bins, contig_depths_file -> + def meta_new = meta.clone() + meta_new['binner'] = 'DASTool' + [ meta_new, bins, contig_depths_file ] + } + + MAG_DEPTHS_REFINED ( ch_input_for_magdepth ) + + // Plot bin depths heatmap for each assembly and mapped samples (according to `binning_map_mode`) + // create file containing group information for all samples + ch_sample_groups = reads + .collectFile(name:'sample_groups.tsv'){ meta, reads -> meta.id + '\t' + meta.group + '\n' } + + // Filter MAG depth files: use only those for plotting that contain depths for > 2 samples + ch_mag_depths_plot_refined = MAG_DEPTHS_REFINED.out.depths + .map { meta, bin_depths_file -> + if (getColNo(bin_depths_file) > 2) [ meta, bin_depths_file ] + } + + MAG_DEPTHS_PLOT_REFINED ( ch_mag_depths_plot_refined, ch_sample_groups.collect() ) + MAG_DEPTHS_SUMMARY_REFINED ( MAG_DEPTHS_REFINED.out.depths.map{it[1]}.collect() ) + + emit: + refined_bins = ch_dastool_bins_newmeta + refined_unbins = RENAME_POSTDASTOOL.out.refined_unbins + refined_depths = MAG_DEPTHS_REFINED.out.depths + refined_depths_summary = MAG_DEPTHS_SUMMARY_REFINED.out.summary + versions = ch_versions +} diff --git a/subworkflows/local/busco_qc.nf b/subworkflows/local/busco_qc.nf index eb7aa92c..0c93437a 100644 --- a/subworkflows/local/busco_qc.nf +++ b/subworkflows/local/busco_qc.nf @@ -55,5 +55,5 @@ workflow BUSCO_QC { summary = BUSCO_SUMMARY.out.summary failed_bin = BUSCO.out.failed_bin.map{it[1]} multiqc = BUSCO.out.summary_domain.map{it[1]} - version = BUSCO.out.version + versions = BUSCO.out.versions } diff --git a/subworkflows/local/gtdbtk.nf b/subworkflows/local/gtdbtk.nf index 8b30e6b1..fd971a33 100644 --- a/subworkflows/local/gtdbtk.nf +++ b/subworkflows/local/gtdbtk.nf @@ -19,7 +19,7 @@ workflow GTDBTK { // Filter bins: classify only medium & high quality MAGs // Collect completness and contamination metrics from busco summary def bin_metrics = [:] - busco_summary + ch_busco_metrics = busco_summary .splitCsv(header: true, sep: '\t') .map { row -> def completeness = -1 @@ -36,10 +36,9 @@ workflow GTDBTK { if (duplicated != '') contamination = Double.parseDouble(duplicated) [row.'GenomeBin', completeness, contamination] } - .set { ch_busco_metrics } // Filter bins based on collected metrics: completeness, contamination - bins + ch_filtered_bins = bins .transpose() .map { meta, bin -> [bin.getName(), bin, meta]} .join(ch_busco_metrics, failOnDuplicate: true, failOnMismatch: true) @@ -50,7 +49,6 @@ workflow GTDBTK { discarded: (it[2] == -1 || it[2] < params.gtdbtk_min_completeness || it[3] == -1 || it[3] > params.gtdbtk_max_contamination) return [it[0], it[1]] } - .set { ch_filtered_bins } GTDBTK_DB_PREPARATION ( gtdb ) GTDBTK_CLASSIFY ( @@ -67,5 +65,5 @@ workflow GTDBTK { emit: summary = GTDBTK_SUMMARY.out.summary - version = GTDBTK_CLASSIFY.out.version + versions = GTDBTK_CLASSIFY.out.versions } diff --git a/subworkflows/local/input_check.nf b/subworkflows/local/input_check.nf index e634a638..53a5103c 100644 --- a/subworkflows/local/input_check.nf +++ b/subworkflows/local/input_check.nf @@ -2,15 +2,15 @@ // Check input samplesheet and get read channels // -include { hasExtension } from '../../modules/local/functions' - -params.options = [:] +def hasExtension(it, extension) { + it.toString().toLowerCase().endsWith(extension.toLowerCase()) +} workflow INPUT_CHECK { main: if(hasExtension(params.input, "csv")){ // extracts read files from samplesheet CSV and distribute into channels - Channel + ch_input_rows = Channel .from(file(params.input)) .splitCsv(header: true) .map { row -> @@ -30,9 +30,8 @@ workflow INPUT_CHECK { exit 1, "Input samplesheet contains row with ${row.size()} column(s). Expects 5." } } - .set { ch_input_rows } // separate short and long reads - ch_input_rows + ch_raw_short_reads = ch_input_rows .map { id, group, sr1, sr2, lr -> def meta = [:] meta.id = id @@ -43,8 +42,7 @@ workflow INPUT_CHECK { else return [ meta, [ sr1, sr2 ] ] } - .set { ch_raw_short_reads } - ch_input_rows + ch_raw_long_reads = ch_input_rows .map { id, group, sr1, sr2, lr -> if (lr) { def meta = [:] @@ -53,9 +51,8 @@ workflow INPUT_CHECK { return [ meta, lr ] } } - .set { ch_raw_long_reads } } else { - Channel + ch_raw_short_reads = Channel .fromFilePairs(params.input, size: params.single_end ? 1 : 2) .ifEmpty { exit 1, "Cannot find any reads matching: ${params.input}\nNB: Path needs to be enclosed in quotes!\nIf this is single-end data, please specify --single_end on the command line." } .map { row -> @@ -65,7 +62,6 @@ workflow INPUT_CHECK { meta.single_end = params.single_end return [ meta, row[1] ] } - .set { ch_raw_short_reads } ch_input_rows = Channel.empty() ch_raw_long_reads = Channel.empty() } diff --git a/workflows/mag.nf b/workflows/mag.nf index 5f639cca..1d7bff73 100644 --- a/workflows/mag.nf +++ b/workflows/mag.nf @@ -1,13 +1,15 @@ /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ VALIDATE INPUTS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ def summary_params = NfcoreSchema.paramsSummaryMap(workflow, params) // Check already if long reads are provided -include { hasExtension } from '../modules/local/functions' +def hasExtension(it, extension) { + it.toString().toLowerCase().endsWith(extension.toLowerCase()) +} def hybrid = false if(hasExtension(params.input, "csv")){ Channel @@ -34,81 +36,82 @@ for (param in checkPathParamList) { if (param) { file(param, checkIfExists: true if (params.input) { ch_input = file(params.input) } else { exit 1, 'Input samplesheet not specified!' } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CONFIG FILES -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yml", checkIfExists: true) ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config) : Channel.empty() /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT LOCAL MODULES/SUBWORKFLOWS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ -// Don't overwrite global params.modules, create a copy instead and use that within the main script. -def modules = params.modules.clone() -def multiqc_options = modules['multiqc'] -multiqc_options.args += params.multiqc_title ? Utils.joinModuleArgs(["--title \"$params.multiqc_title\""]) : '' - // // MODULE: Local to the pipeline // -include { GET_SOFTWARE_VERSIONS } from '../modules/local/get_software_versions' addParams( options: [publish_files : ['csv':'']] ) include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_HOST_REMOVAL_BUILD } from '../modules/local/bowtie2_removal_build' -include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_HOST_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' addParams( options: modules['bowtie2_host_removal_align'] ) +include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_HOST_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' include { BOWTIE2_REMOVAL_BUILD as BOWTIE2_PHIX_REMOVAL_BUILD } from '../modules/local/bowtie2_removal_build' -include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_PHIX_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' addParams( options: modules['bowtie2_phix_removal_align'] ) +include { BOWTIE2_REMOVAL_ALIGN as BOWTIE2_PHIX_REMOVAL_ALIGN } from '../modules/local/bowtie2_removal_align' include { PORECHOP } from '../modules/local/porechop' -include { NANOLYSE } from '../modules/local/nanolyse' addParams( options: modules['nanolyse'] ) +include { NANOLYSE } from '../modules/local/nanolyse' include { FILTLONG } from '../modules/local/filtlong' -include { NANOPLOT as NANOPLOT_RAW } from '../modules/local/nanoplot' addParams( options: modules['nanoplot_raw'] ) -include { NANOPLOT as NANOPLOT_FILTERED } from '../modules/local/nanoplot' addParams( options: modules['nanoplot_filtered'] ) +include { NANOPLOT as NANOPLOT_RAW } from '../modules/local/nanoplot' +include { NANOPLOT as NANOPLOT_FILTERED } from '../modules/local/nanoplot' include { CENTRIFUGE_DB_PREPARATION } from '../modules/local/centrifuge_db_preparation' -include { CENTRIFUGE } from '../modules/local/centrifuge' addParams( options: modules['centrifuge'] ) +include { CENTRIFUGE } from '../modules/local/centrifuge' include { KRAKEN2_DB_PREPARATION } from '../modules/local/kraken2_db_preparation' -include { KRAKEN2 } from '../modules/local/kraken2' addParams( options: modules['kraken2'] ) +include { KRAKEN2 } from '../modules/local/kraken2' include { KRONA_DB } from '../modules/local/krona_db' -include { KRONA } from '../modules/local/krona' addParams( options: modules['krona'] ) +include { KRONA } from '../modules/local/krona' include { POOL_SINGLE_READS } from '../modules/local/pool_single_reads' include { POOL_PAIRED_READS } from '../modules/local/pool_paired_reads' include { POOL_SINGLE_READS as POOL_LONG_READS } from '../modules/local/pool_single_reads' -include { MEGAHIT } from '../modules/local/megahit' addParams( options: modules['megahit'] ) -include { SPADES } from '../modules/local/spades' addParams( options: modules['spades'] ) -include { SPADESHYBRID } from '../modules/local/spadeshybrid' addParams( options: modules['spadeshybrid'] ) -include { QUAST } from '../modules/local/quast' addParams( options: modules['quast'] ) -include { QUAST_BINS } from '../modules/local/quast_bins' addParams( options: modules['quast_bins'] ) -include { QUAST_BINS_SUMMARY } from '../modules/local/quast_bins_summary' addParams( options: modules['quast_bins_summary'] ) +include { MEGAHIT } from '../modules/local/megahit' +include { SPADES } from '../modules/local/spades' +include { SPADESHYBRID } from '../modules/local/spadeshybrid' +include { QUAST } from '../modules/local/quast' +include { QUAST_BINS } from '../modules/local/quast_bins' +include { QUAST_BINS_SUMMARY } from '../modules/local/quast_bins_summary' include { CAT_DB } from '../modules/local/cat_db' -include { CAT_DB_GENERATE } from '../modules/local/cat_db_generate' addParams( options: modules['cat_db_generate'] ) -include { CAT } from '../modules/local/cat' addParams( options: modules['cat'] ) -include { BIN_SUMMARY } from '../modules/local/bin_summary' addParams( options: modules['bin_summary'] ) -include { MULTIQC } from '../modules/local/multiqc' addParams( options: multiqc_options ) +include { CAT_DB_GENERATE } from '../modules/local/cat_db_generate' +include { CAT } from '../modules/local/cat' +include { BIN_SUMMARY } from '../modules/local/bin_summary' +include { COMBINE_TSV } from '../modules/local/combine_tsv' +include { MULTIQC } from '../modules/local/multiqc' // // SUBWORKFLOW: Consisting of a mix of local and nf-core/modules // include { INPUT_CHECK } from '../subworkflows/local/input_check' -include { METABAT2_BINNING } from '../subworkflows/local/metabat2_binning' addParams( bowtie2_align_options: modules['bowtie2_assembly_align'], metabat2_options: modules['metabat2'], mag_depths_options: modules['mag_depths'], mag_depths_plot_options: modules['mag_depths_plot'], mag_depths_summary_options: modules['mag_depths_summary']) -include { BUSCO_QC } from '../subworkflows/local/busco_qc' addParams( busco_db_options: modules['busco_db_preparation'], busco_options: modules['busco'], busco_save_download_options: modules['busco_save_download'], busco_plot_options: modules['busco_plot'], busco_summary_options: modules['busco_summary']) -include { GTDBTK } from '../subworkflows/local/gtdbtk' addParams( gtdbtk_classify_options: modules['gtdbtk_classify'], gtdbtk_summary_options: modules['gtdbtk_summary']) +include { BINNING_PREPARATION } from '../subworkflows/local/binning_preparation' +include { BINNING } from '../subworkflows/local/binning' +include { BINNING_REFINEMENT } from '../subworkflows/local/binning_refinement' +include { BUSCO_QC } from '../subworkflows/local/busco_qc' +include { GTDBTK } from '../subworkflows/local/gtdbtk' +include { ANCIENT_DNA_ASSEMLY_VALIDATION } from '../subworkflows/local/ancient_dna' /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ IMPORT NF-CORE MODULES/SUBWORKFLOWS -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // // MODULE: Installed directly from nf-core/modules // -include { FASTQC as FASTQC_RAW } from '../modules/nf-core/modules/fastqc/main' addParams( options: modules['fastqc_raw'] ) -include { FASTQC as FASTQC_TRIMMED } from '../modules/nf-core/modules/fastqc/main' addParams( options: modules['fastqc_trimmed'] ) -include { FASTP } from '../modules/nf-core/modules/fastp/main' addParams( options: modules['fastp'] ) -include { PRODIGAL } from '../modules/nf-core/modules/prodigal/main' addParams( options: modules['prodigal'] ) -include { PROKKA } from '../modules/nf-core/modules/prokka/main' addParams( options: modules['prokka'] ) +include { FASTQC as FASTQC_RAW } from '../modules/nf-core/modules/fastqc/main' +include { FASTQC as FASTQC_TRIMMED } from '../modules/nf-core/modules/fastqc/main' +include { FASTP } from '../modules/nf-core/modules/fastp/main' +include { ADAPTERREMOVAL as ADAPTERREMOVAL_PE } from '../modules/nf-core/modules/adapterremoval/main' +include { ADAPTERREMOVAL as ADAPTERREMOVAL_SE } from '../modules/nf-core/modules/adapterremoval/main' +include { PRODIGAL } from '../modules/nf-core/modules/prodigal/main' +include { PROKKA } from '../modules/nf-core/modules/prokka/main' +include { CUSTOM_DUMPSOFTWAREVERSIONS } from '../modules/nf-core/modules/custom/dumpsoftwareversions/main' //////////////////////////////////////////////////// /* -- Create channel for reference databases -- */ @@ -116,86 +119,74 @@ include { PROKKA } from '../modules/nf-core/modules/prokka/mai if ( params.host_genome ) { host_fasta = params.genomes[params.host_genome].fasta ?: false - Channel + ch_host_fasta = Channel .value(file( "${host_fasta}" )) - .set { ch_host_fasta } - host_bowtie2index = params.genomes[params.host_genome].bowtie2 ?: false - Channel + ch_host_bowtie2index = Channel .value(file( "${host_bowtie2index}/*" )) - .set { ch_host_bowtie2index } } else if ( params.host_fasta ) { - Channel + ch_host_fasta = Channel .value(file( "${params.host_fasta}" )) - .set { ch_host_fasta } } else { ch_host_fasta = Channel.empty() } if(params.busco_reference){ - Channel + ch_busco_db_file = Channel .value(file( "${params.busco_reference}" )) - .set { ch_busco_db_file } } else { ch_busco_db_file = Channel.empty() } if (params.busco_download_path) { - Channel + ch_busco_download_folder = Channel .value(file( "${params.busco_download_path}" )) - .set { ch_busco_download_folder } } else { ch_busco_download_folder = Channel.empty() } if(params.centrifuge_db){ - Channel + ch_centrifuge_db_file = Channel .value(file( "${params.centrifuge_db}" )) - .set { ch_centrifuge_db_file } } else { ch_centrifuge_db_file = Channel.empty() } if(params.kraken2_db){ - Channel + ch_kraken2_db_file = Channel .value(file( "${params.kraken2_db}" )) - .set { ch_kraken2_db_file } } else { ch_kraken2_db_file = Channel.empty() } if(params.cat_db){ - Channel + ch_cat_db_file = Channel .value(file( "${params.cat_db}" )) - .set { ch_cat_db_file } } else { ch_cat_db_file = Channel.empty() } if(!params.keep_phix) { - Channel + ch_phix_db_file = Channel .value(file( "${params.phix_reference}" )) - .set { ch_phix_db_file } } if (!params.keep_lambda) { - Channel + ch_nanolyse_db = Channel .value(file( "${params.lambda_reference}" )) - .set { ch_nanolyse_db } } gtdb = params.skip_busco ? false : params.gtdb if (gtdb) { - Channel + ch_gtdb = Channel .value(file( "${gtdb}" )) - .set { ch_gtdb } } else { ch_gtdb = Channel.empty() } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ RUN MAIN WORKFLOW -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ // Info required for completion email and summary @@ -204,7 +195,7 @@ def busco_failed_bins = [:] workflow MAG { - ch_software_versions = Channel.empty() + ch_versions = Channel.empty() // // SUBWORKFLOW: Read in samplesheet, validate and stage input files @@ -222,13 +213,45 @@ workflow MAG { FASTQC_RAW ( ch_raw_short_reads ) - ch_software_versions = ch_software_versions.mix(FASTQC_RAW.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(FASTQC_RAW.out.versions.first()) - FASTP ( - ch_raw_short_reads - ) - ch_short_reads = FASTP.out.reads - ch_software_versions = ch_software_versions.mix(FASTP.out.version.first().ifEmpty(null)) + if ( params.clip_tool == 'fastp' ) { + ch_clipmerge_out = FASTP ( + ch_raw_short_reads, + params.fastp_save_trimmed_fail, + [] + ) + ch_short_reads = FASTP.out.reads + ch_versions = ch_versions.mix(FASTP.out.versions.first()) + + } else if ( params.clip_tool == 'adapterremoval' ) { + + // due to strange output file scheme in AR2, have to manually separate + // SE/PE to allow correct pulling of reads after. + ch_adapterremoval_in = ch_raw_short_reads + .branch { + single: it[0]['single_end'] + paired: !it[0]['single_end'] + } + + ADAPTERREMOVAL_PE ( ch_adapterremoval_in.paired, [] ) + ADAPTERREMOVAL_SE ( ch_adapterremoval_in.single, [] ) + + // pair1 and 2 come for PE data from separate output channels, so bring + // this back together again here + ch_adapterremoval_pe_out = Channel.empty() + ch_adapterremoval_pe_out = ADAPTERREMOVAL_PE.out.pair1_truncated + .join(ADAPTERREMOVAL_PE.out.pair2_truncated) + .map { + [ it[0], [it[1], it[2]] ] + } + + ch_short_reads = Channel.empty() + ch_short_reads = ch_short_reads.mix(ADAPTERREMOVAL_SE.out.singles_truncated, ch_adapterremoval_pe_out) + + ch_versions = ch_versions.mix(ADAPTERREMOVAL_PE.out.versions.first(), ADAPTERREMOVAL_SE.out.versions.first()) + + } if (params.host_fasta){ BOWTIE2_HOST_REMOVAL_BUILD ( @@ -244,7 +267,7 @@ workflow MAG { ) ch_short_reads = BOWTIE2_HOST_REMOVAL_ALIGN.out.reads ch_bowtie2_removal_host_multiqc = BOWTIE2_HOST_REMOVAL_ALIGN.out.log - ch_software_versions = ch_software_versions.mix(BOWTIE2_HOST_REMOVAL_ALIGN.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(BOWTIE2_HOST_REMOVAL_ALIGN.out.versions.first()) } if(!params.keep_phix) { @@ -256,11 +279,13 @@ workflow MAG { BOWTIE2_PHIX_REMOVAL_BUILD.out.index ) ch_short_reads = BOWTIE2_PHIX_REMOVAL_ALIGN.out.reads + ch_versions = ch_versions.mix(BOWTIE2_PHIX_REMOVAL_ALIGN.out.versions.first()) } FASTQC_TRIMMED ( ch_short_reads ) + ch_versions = ch_versions.mix(FASTQC_TRIMMED.out.versions) /* ================================================================================ @@ -270,7 +295,7 @@ workflow MAG { NANOPLOT_RAW ( ch_raw_long_reads ) - ch_software_versions = ch_software_versions.mix(NANOPLOT_RAW.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(NANOPLOT_RAW.out.versions.first()) ch_long_reads = ch_raw_long_reads if (!params.skip_adapter_trimming) { @@ -278,7 +303,7 @@ workflow MAG { ch_raw_long_reads ) ch_long_reads = PORECHOP.out.reads - ch_software_versions = ch_software_versions.mix(PORECHOP.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(PORECHOP.out.versions.first()) } if (!params.keep_lambda) { @@ -287,25 +312,23 @@ workflow MAG { ch_nanolyse_db ) ch_long_reads = NANOLYSE.out.reads - ch_software_versions = ch_software_versions.mix(NANOLYSE.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(NANOLYSE.out.versions.first()) } // join long and short reads by sample name - ch_short_reads + ch_short_reads_tmp = ch_short_reads .map { meta, sr -> [ meta.id, meta, sr ] } - .set {ch_short_reads_tmp} - ch_long_reads + ch_short_and_long_reads = ch_long_reads .map { meta, lr -> [ meta.id, meta, lr ] } .join(ch_short_reads_tmp, by: 0) .map { id, meta_lr, lr, meta_sr, sr -> [ meta_lr, lr, sr[0], sr[1] ] } // should not occur for single-end, since SPAdes (hybrid) does not support single-end - .set{ ch_short_and_long_reads } FILTLONG ( ch_short_and_long_reads ) ch_long_reads = FILTLONG.out.reads - ch_software_versions = ch_software_versions.mix(FILTLONG.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(FILTLONG.out.versions.first()) NANOPLOT_FILTERED ( ch_long_reads @@ -321,7 +344,7 @@ workflow MAG { ch_short_reads, CENTRIFUGE_DB_PREPARATION.out.db ) - ch_software_versions = ch_software_versions.mix(CENTRIFUGE.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(CENTRIFUGE.out.versions.first()) KRAKEN2_DB_PREPARATION ( ch_kraken2_db_file @@ -330,22 +353,21 @@ workflow MAG { ch_short_reads, KRAKEN2_DB_PREPARATION.out.db ) - ch_software_versions = ch_software_versions.mix(KRAKEN2.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(KRAKEN2.out.versions.first()) if (( params.centrifuge_db || params.kraken2_db ) && !params.skip_krona){ KRONA_DB () - CENTRIFUGE.out.results_for_krona.mix(KRAKEN2.out.results_for_krona) + ch_tax_classifications = CENTRIFUGE.out.results_for_krona.mix(KRAKEN2.out.results_for_krona) . map { classifier, meta, report -> def meta_new = meta.clone() meta_new.classifier = classifier [ meta_new, report ] } - .set { ch_tax_classifications } KRONA ( ch_tax_classifications, KRONA_DB.out.db.collect() ) - ch_software_versions = ch_software_versions.mix(KRONA.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(KRONA.out.versions.first()) } /* @@ -358,7 +380,7 @@ workflow MAG { if (params.coassemble_group) { // short reads // group and set group as new id - ch_short_reads + ch_short_reads_grouped = ch_short_reads .map { meta, reads -> [ meta.group, meta, reads ] } .groupTuple(by: 0) .map { group, metas, reads -> @@ -369,10 +391,9 @@ workflow MAG { if (!params.single_end) [ meta, reads.collect { it[0] }, reads.collect { it[1] } ] else [ meta, reads.collect { it }, [] ] } - .set { ch_short_reads_grouped } // long reads // group and set group as new id - ch_long_reads + ch_long_reads_grouped = ch_long_reads .map { meta, reads -> [ meta.group, meta, reads ] } .groupTuple(by: 0) .map { group, metas, reads -> @@ -381,27 +402,24 @@ workflow MAG { meta.group = group [ meta, reads.collect { it } ] } - .set { ch_long_reads_grouped } } else { - ch_short_reads + ch_short_reads_grouped = ch_short_reads .map { meta, reads -> if (!params.single_end){ [ meta, [reads[0]], [reads[1]] ] } else [ meta, [reads], [] ] } - .set { ch_short_reads_grouped } } ch_assemblies = Channel.empty() if (!params.skip_megahit){ MEGAHIT ( ch_short_reads_grouped ) - MEGAHIT.out.assembly + ch_megahit_assemblies = MEGAHIT.out.assembly .map { meta, assembly -> def meta_new = meta.clone() meta_new.assembler = "MEGAHIT" [ meta_new, assembly ] } - .set { ch_megahit_assemblies } ch_assemblies = ch_assemblies.mix(ch_megahit_assemblies) - ch_software_versions = ch_software_versions.mix(MEGAHIT.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(MEGAHIT.out.versions.first()) } // Co-assembly: pool reads for SPAdes @@ -423,50 +441,45 @@ workflow MAG { } } else { ch_short_reads_spades = ch_short_reads - ch_long_reads + ch_long_reads_spades = ch_long_reads .map { meta, reads -> [ meta, [reads] ] } - .set { ch_long_reads_spades } } if (!params.single_end && !params.skip_spades){ SPADES ( ch_short_reads_spades ) - SPADES.out.assembly + ch_spades_assemblies = SPADES.out.assembly .map { meta, assembly -> def meta_new = meta.clone() meta_new.assembler = "SPAdes" [ meta_new, assembly ] } - .set { ch_spades_assemblies } ch_assemblies = ch_assemblies.mix(ch_spades_assemblies) - ch_software_versions = ch_software_versions.mix(SPADES.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(SPADES.out.versions.first()) } if (!params.single_end && !params.skip_spadeshybrid){ - ch_short_reads_spades + ch_short_reads_spades_tmp = ch_short_reads_spades .map { meta, reads -> [ meta.id, meta, reads ] } - .set {ch_short_reads_spades_tmp} - ch_long_reads_spades + ch_reads_spadeshybrid = ch_long_reads_spades .map { meta, reads -> [ meta.id, meta, reads ] } .combine(ch_short_reads_spades_tmp, by: 0) .map { id, meta_long, long_reads, meta_short, short_reads -> [ meta_short, long_reads, short_reads ] } - .set { ch_reads_spadeshybrid } SPADESHYBRID ( ch_reads_spadeshybrid ) - SPADESHYBRID.out.assembly + ch_spadeshybrid_assemblies = SPADESHYBRID.out.assembly .map { meta, assembly -> def meta_new = meta.clone() meta_new.assembler = "SPAdesHybrid" [ meta_new, assembly ] } - .set { ch_spadeshybrid_assemblies } ch_assemblies = ch_assemblies.mix(ch_spadeshybrid_assemblies) - ch_software_versions = ch_software_versions.mix(SPADESHYBRID.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(SPADESHYBRID.out.versions.first()) } ch_quast_multiqc = Channel.empty() if (!params.skip_quast){ QUAST ( ch_assemblies ) ch_quast_multiqc = QUAST.out.qc - ch_software_versions = ch_software_versions.mix(QUAST.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(QUAST.out.versions.first()) } /* @@ -478,42 +491,108 @@ workflow MAG { if (!params.skip_prodigal){ PRODIGAL ( ch_assemblies, - modules['prodigal']['output_format'] + 'gff' ) - ch_software_versions = ch_software_versions.mix(PRODIGAL.out.versions.first().ifEmpty(null)) + ch_versions = ch_versions.mix(PRODIGAL.out.versions.first()) } /* ================================================================================ - Binning + Binning preparation ================================================================================ */ + ch_bowtie2_assembly_multiqc = Channel.empty() ch_busco_summary = Channel.empty() ch_busco_multiqc = Channel.empty() + + + + BINNING_PREPARATION ( + ch_assemblies, + ch_short_reads + ) + + /* + ================================================================================ + Ancient DNA + ================================================================================ + */ + + if (params.ancient_dna){ + ANCIENT_DNA_ASSEMLY_VALIDATION(BINNING_PREPARATION.out.grouped_mappings) + ch_versions = ch_versions.mix(ANCIENT_DNA_ASSEMLY_VALIDATION.out.versions.first()) + } + + /* + ================================================================================ + Binning + ================================================================================ + */ + if (!params.skip_binning){ - METABAT2_BINNING ( - ch_assemblies, - ch_short_reads - ) - ch_bowtie2_assembly_multiqc = METABAT2_BINNING.out.bowtie2_assembly_multiqc - ch_software_versions = ch_software_versions.mix(METABAT2_BINNING.out.bowtie2_version.first().ifEmpty(null)) - ch_software_versions = ch_software_versions.mix(METABAT2_BINNING.out.metabat2_version.first().ifEmpty(null)) + if (params.ancient_dna) { + BINNING ( + BINNING_PREPARATION.out.grouped_mappings + .join(ANCIENT_DNA_ASSEMLY_VALIDATION.out.contigs_recalled) + .map{ it -> [ it[0], it[4], it[2], it[3] ] }, // [meta, contigs_recalled, bam, bais] + ch_short_reads + ) + } else { + BINNING ( + BINNING_PREPARATION.out.grouped_mappings, + ch_short_reads + ) + } + + ch_bowtie2_assembly_multiqc = BINNING_PREPARATION.out.bowtie2_assembly_multiqc + ch_versions = ch_versions.mix(BINNING_PREPARATION.out.bowtie2_version.first()) + ch_versions = ch_versions.mix(BINNING.out.versions) + + /* + * DAS Tool: binning refinement + */ + + if ( params.refine_bins_dastool && !params.skip_metabat2 && !params.skip_maxbin2 ) { + + BINNING_REFINEMENT ( BINNING_PREPARATION.out.grouped_mappings, BINNING.out.bins, BINNING.out.metabat2depths, ch_short_reads ) + ch_versions = ch_versions.mix(BINNING_REFINEMENT.out.versions) + + if ( params.postbinning_input == 'raw_bins_only' ) { + ch_input_for_postbinning_bins = BINNING.out.bins + ch_input_for_postbinning_bins_unbins = BINNING.out.bins.mix(BINNING.out.unbinned) + ch_input_for_binsummary = BINNING.out.depths_summary + } else if ( params.postbinning_input == 'refined_bins_only' ) { + ch_input_for_postbinning_bins = BINNING_REFINEMENT.out.refined_bins + ch_input_for_postbinning_bins_unbins = BINNING_REFINEMENT.out.refined_bins.mix(BINNING_REFINEMENT.out.refined_unbins) + ch_input_for_binsummary = BINNING_REFINEMENT.out.refined_depths_summary + } else if (params.postbinning_input == 'both') { + ch_input_for_postbinning_bins = BINNING.out.bins.mix(BINNING_REFINEMENT.out.refined_bins) + ch_input_for_postbinning_bins_unbins = BINNING.out.bins.mix(BINNING.out.unbinned,BINNING_REFINEMENT.out.refined_bins,BINNING_REFINEMENT.out.refined_unbins) + ch_combinedepthtsvs_for_binsummary = BINNING.out.depths_summary.mix(BINNING_REFINEMENT.out.refined_depths_summary) + ch_input_for_binsummary = COMBINE_TSV ( ch_combinedepthtsvs_for_binsummary.collect() ).combined + } + } else { + ch_input_for_postbinning_bins = BINNING.out.bins + ch_input_for_postbinning_bins_unbins = BINNING.out.bins.mix(BINNING.out.unbinned) + ch_input_for_binsummary = BINNING.out.depths_summary + } if (!params.skip_busco){ /* * BUSCO subworkflow: Quantitative measures for the assessment of genome assembly */ + ch_input_bins_busco = ch_input_for_postbinning_bins_unbins.transpose() BUSCO_QC ( ch_busco_db_file, ch_busco_download_folder, - METABAT2_BINNING.out.bins.transpose() + ch_input_bins_busco ) ch_busco_summary = BUSCO_QC.out.summary ch_busco_multiqc = BUSCO_QC.out.multiqc - ch_software_versions = ch_software_versions.mix(BUSCO_QC.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(BUSCO_QC.out.versions.first()) // process information if BUSCO analysis failed for individual bins due to no matching genes BUSCO_QC.out .failed_bin @@ -523,8 +602,15 @@ workflow MAG { ch_quast_bins_summary = Channel.empty() if (!params.skip_quast){ - QUAST_BINS ( METABAT2_BINNING.out.bins ) - ch_software_versions = ch_software_versions.mix(QUAST_BINS.out.version.first().ifEmpty(null)) + ch_input_for_quast_bins = ch_input_for_postbinning_bins_unbins + .groupTuple() + .map{ + meta, reads -> + def new_reads = reads.flatten() + [meta, new_reads] + } + QUAST_BINS ( ch_input_for_quast_bins ) + ch_versions = ch_versions.mix(QUAST_BINS.out.versions.first()) QUAST_BINS_SUMMARY ( QUAST_BINS.out.quast_bin_summaries.collect() ) ch_quast_bins_summary = QUAST_BINS_SUMMARY.out.summary } @@ -541,10 +627,10 @@ workflow MAG { ch_cat_db = CAT_DB_GENERATE.out.db } CAT ( - METABAT2_BINNING.out.bins, + ch_input_for_postbinning_bins, ch_cat_db ) - ch_software_versions = ch_software_versions.mix(CAT.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(CAT.out.versions.first()) /* * GTDB-tk: taxonomic classifications using GTDB reference @@ -552,17 +638,17 @@ workflow MAG { ch_gtdbtk_summary = Channel.empty() if ( gtdb ){ GTDBTK ( - METABAT2_BINNING.out.bins, + ch_input_for_postbinning_bins_unbins, ch_busco_summary, ch_gtdb ) - ch_software_versions = ch_software_versions.mix(GTDBTK.out.version.first().ifEmpty(null)) + ch_versions = ch_versions.mix(GTDBTK.out.versions.first()) ch_gtdbtk_summary = GTDBTK.out.summary } if (!params.skip_busco || !params.skip_quast || gtdb){ BIN_SUMMARY ( - METABAT2_BINNING.out.depths_summary, + ch_input_for_binsummary, ch_busco_summary.ifEmpty([]), ch_quast_bins_summary.ifEmpty([]), ch_gtdbtk_summary.ifEmpty([]) @@ -572,13 +658,12 @@ workflow MAG { /* * Prokka: Genome annotation */ - METABAT2_BINNING.out.bins.transpose() + ch_bins_for_prokka = ch_input_for_postbinning_bins_unbins.transpose() .map { meta, bin -> def meta_new = meta.clone() meta_new.id = bin.getBaseName() [ meta_new, bin ] } - .set { ch_bins_for_prokka } if (!params.skip_prokka){ PROKKA ( @@ -586,23 +671,12 @@ workflow MAG { [], [] ) - ch_software_versions = ch_software_versions.mix(PROKKA.out.versions.first().ifEmpty(null)) + ch_versions = ch_versions.mix(PROKKA.out.versions.first()) } } - // - // MODULE: Pipeline reporting - // - ch_software_versions - .map { it -> if (it) [ it.baseName, it ] } - .groupTuple() - .map { it[1][0] } - .flatten() - .collect() - .set { ch_software_versions } - - GET_SOFTWARE_VERSIONS ( - ch_software_versions.map { it }.collect() + CUSTOM_DUMPSOFTWAREVERSIONS ( + ch_versions.unique().collectFile(name: 'collated_versions.yml') ) // @@ -614,27 +688,48 @@ workflow MAG { ch_multiqc_files = Channel.empty() ch_multiqc_files = ch_multiqc_files.mix(Channel.from(ch_multiqc_config)) ch_multiqc_files = ch_multiqc_files.mix(ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')) - ch_multiqc_files = ch_multiqc_files.mix(GET_SOFTWARE_VERSIONS.out.yaml.collect()) + ch_multiqc_files = ch_multiqc_files.mix(CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect()) + /* //This is the template input with the nf-core module + ch_multiqc_files = ch_multiqc_files.mix(FASTQC_RAW.out.zip.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(FASTP.out.json.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(FASTQC_TRIMMED.out.zip.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ch_bowtie2_removal_host_multiqc.collect{it[1]}.ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ch_quast_multiqc.collect().ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ch_bowtie2_assembly_multiqc.collect().ifEmpty([])) + ch_multiqc_files = ch_multiqc_files.mix(ch_busco_multiqc.collect().ifEmpty([])) + + MULTIQC ( + ch_multiqc_files.collect() + ) + */ + + ch_multiqc_readprep = Channel.empty() + + if ( params.clip_tool == "fastp") { + ch_multiqc_readprep = ch_multiqc_readprep.mix(FASTP.out.json.collect{it[1]}.ifEmpty([])) + } else if ( params.clip_tool == "adapterremoval" ) { + ch_multiqc_readprep = ch_multiqc_readprep.mix(ADAPTERREMOVAL_PE.out.log.collect{it[1]}.ifEmpty([]), ADAPTERREMOVAL_SE.out.log.collect{it[1]}.ifEmpty([])) + } MULTIQC ( ch_multiqc_files.collect(), ch_multiqc_custom_config.collect().ifEmpty([]), FASTQC_RAW.out.zip.collect{it[1]}.ifEmpty([]), - FASTP.out.json.collect{it[1]}.ifEmpty([]), FASTQC_TRIMMED.out.zip.collect{it[1]}.ifEmpty([]), ch_bowtie2_removal_host_multiqc.collect{it[1]}.ifEmpty([]), ch_quast_multiqc.collect().ifEmpty([]), ch_bowtie2_assembly_multiqc.collect().ifEmpty([]), - ch_busco_multiqc.collect().ifEmpty([]) + ch_busco_multiqc.collect().ifEmpty([]), + ch_multiqc_readprep.collect().ifEmpty([]), ) - multiqc_report = MULTIQC.out.report.toList() - ch_software_versions = ch_software_versions.mix(MULTIQC.out.version.ifEmpty(null)) + multiqc_report = MULTIQC.out.report.toList() + ch_versions = ch_versions.mix(MULTIQC.out.versions) } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ COMPLETION EMAIL AND SUMMARY -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */ workflow.onComplete { @@ -645,7 +740,7 @@ workflow.onComplete { } /* -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ THE END -======================================================================================== +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */