Skip to content

Commit

Permalink
Merge pull request #218 from eastgenomics/release_v3.2.0
Browse files Browse the repository at this point in the history
Release v3.2.0 -> main (#218)

Co-Authored-By: growland2 <[email protected]>
Co-Authored-By: Yu-jinKim <[email protected]>
Co-Authored-By: Katherine Winfield <[email protected]>
  • Loading branch information
4 people authored Aug 1, 2024
2 parents b63a04e + 1ef7459 commit eea364e
Show file tree
Hide file tree
Showing 11 changed files with 888 additions and 38 deletions.
2 changes: 2 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
[run]
omit = resources/home/dnanexus/dias_batch/tests/*
10 changes: 9 additions & 1 deletion dxapp.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "eggd_dias_batch",
"title": "eggd_dias_batch",
"version": "3.1.0",
"version": "3.2.0",
"summary": "Launches downstream analyses for Dias",
"dxapi": "1.0.0",
"inputSpec": [
Expand Down Expand Up @@ -166,6 +166,14 @@
"optional": true,
"default": false,
"help": "controls whether to automatically unarchive any required files that are archived. Default is to fail the app with a list of files required to unarchive. If set to true, all required files will start to be unarchived and the job will exit with a zero exit code and the job tagged to state no jobs were launched"
},
{
"name": "unarchive_only",
"label": "unarchive_only",
"class": "boolean",
"optional": true,
"default": false,
"help": "controls if to only run the app to check for archived files and unarchive (i.e no launching of jobs), if all files are found in an unarchived state the app will exit with a zero exit code"
}
],
"outputSpec": [
Expand Down
7 changes: 7 additions & 0 deletions readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ DNAnexus app for launching CNV calling, one or more of SNV, CNV and mosaic repor
- `-iexclude_controls` (`bool`): controls if to automatically exclude control samples from CNV calling based on the regex pattern `'^\w+-\w+Q\w+-'` (default: `true`)
- `-isplit_tests` (`bool`): controls if to split multiple panels / genes in a manifest to individual reports instead of being combined into one
- `-iunarchive` (`bool`): controls whether to automatically unarchive any required files that are archived. Default is to fail the app with a list of files required to unarchive. If set to true, all required files will start to be unarchived and the job will exit with a zero exit code and the job tagged to state no jobs were launched
- `-iunarchive_only` (`bool`): controls if to only run the app to check for archived files and unarchive (i.e no launching of jobs), if all files are found in an unarchived state the app will exit with a zero exit code.
- n.b. in this mode, `unarchive` defaults to True and unarchiving will always be run


#### Running modes
Expand All @@ -59,6 +61,8 @@ DNAnexus app for launching CNV calling, one or more of SNV, CNV and mosaic repor

The app takes as a minimum input a path to Dias single output, an assay config, and at least one of the above listed running modes. The default behaviour is to pass an assay string specified to run for (with `-iassay`), which will search DNAnexus for the highest version config file in `-iassay_config_dir` (default: `001_Reference:/dynamic_files/dias_batch_configs/`) and use this for analysis. Alternatively, an assay config file may be specified to use instead with `-iassay_config_file`. If running a reports workflow a manifest file must also be specified.

Before any jobs are launched, a check of the archival state of all required files is first made. This will use the file pattern mappings either defined in `utils.defaults` or from the assay config file (if specified) to search for the per sample and per run files required, any will raise an error on any archived files if `unarchive=True` is not set.

The general behaviour of each mode is as follows:

### CNV calling
Expand Down Expand Up @@ -221,6 +225,7 @@ The top level section should be structured as follows:
- `{cnv_call_app|_report_workflow}_id` (`str`) : the IDs of CNV calling and reports workflows to use
- `reference_files` (`dict`) : mapping of reference file name : DNAnexus file ID, reference file name _must_ be given as shown above, and DNAnexus file ID should be provided as `project-xxx:file-xxx`
- `name_patterns` (`dict`) : mapping of the manifest source and a regex pattern to use for filtering sample names and files etc.
- `mode_file_patterns` (`dict` | optional): mapping for each running mode to sample and run file patterns for which to search and check the archival state of before launching any jobs. Defaults are defined in `utils.defaults`, and a mapping of the same structure may be added to the assay config file to override the defaults.

The definitions of inputs for CNV calling and each reports workflow should be defined under the key `modes`, containing a mapping of all inputs and other inputs for controlling running of analyses.

Expand Down Expand Up @@ -311,6 +316,8 @@ The definitions of inputs for CNV calling and each reports workflow should be de
- `INPUT-test_codes` : `'&&'` separated string of test codes
- `INPUT-sample_name` : string of sample name from manifest

These are added to the config via [`utils.add_dynamic_inputs`](https://github.com/eastgenomics/eggd_dias_batch/blob/b63a04e2d421a246017e984efcc2a9eef85fbeaf/resources/home/dnanexus/dias_batch/utils/utils.py#L1073) from kwargs generated at run time specified [here](https://github.com/eastgenomics/eggd_dias_batch/blob/b63a04e2d421a246017e984efcc2a9eef85fbeaf/resources/home/dnanexus/dias_batch/utils/dx_requests.py#L1170).

---

## What does this app output
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
dxpy==0.318.1
packaging==20.3
packaging==24.1
pandas==1.4.1
pytest==7.0.1
pytest-cov==4.0.0
Expand Down
60 changes: 57 additions & 3 deletions resources/home/dnanexus/dias_batch/dias_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@
make_path,
parse_manifest,
parse_genepanels,
prettier_print,
time_stamp,
write_summary_report
)
Expand All @@ -32,7 +31,6 @@
make_path,
parse_manifest,
parse_genepanels,
prettier_print,
time_stamp,
write_summary_report
)
Expand Down Expand Up @@ -61,9 +59,11 @@ def __init__(self, **inputs) -> None:

self.inputs = inputs
self.errors = []
self.strip_string_inputs()
self.check_assay()
self.check_assay_config_dir()
self.check_mode_set()
self.check_unarchive_set()
self.check_single_output_dir()
self.check_cnv_call_and_cnv_call_job_id_mutually_exclusive()
self.check_cnv_calling_for_cnv_reports()
Expand Down Expand Up @@ -168,6 +168,18 @@ def check_mode_set(self):
'Reports argument specified with no manifest file'
)

def check_unarchive_set(self):
"""
Checks that if unarchive_only specified that unarchive will
default to also being specified
"""
if self.inputs.get('unarchive_only') and not self.inputs.get('unarchive'):
print(
"-iunarchive_only specified but -iunarchive not specified, "
"setting unarchive to True"
)
self.inputs['unarchive'] = True

def check_cnv_call_and_cnv_call_job_id_mutually_exclusive(self):
"""
Check that both cnv_call and cnv_call_job_id have not been
Expand Down Expand Up @@ -251,6 +263,24 @@ def check_exclude_samples_file_id(self):
f"{self.inputs.get('exclude_samples')}"
)

def strip_string_inputs(self):
"""
Strip string type inputs to ensure no leading or trailing
whitespace are retained
"""
string_inputs = [
'assay',
'assay_config_dir',
'exclude_samples',
'manifest_subset',
'single_output_dir',
'cnv_call_job_id'
]

for string in string_inputs:
if self.inputs.get(string) and isinstance(self.inputs.get(string), str):
self.inputs[string] = self.inputs[string].strip()


@dxpy.entry_point('main')
def main(
Expand All @@ -274,7 +304,8 @@ def main(
multiqc_report=None,
testing=False,
sample_limit=None,
unarchive=None
unarchive=None,
unarchive_only=None
):
dxpy.set_workspace_id(os.environ.get('DX_PROJECT_CONTEXT_ID'))

Expand All @@ -283,6 +314,9 @@ def main(
# assign single out dir in case of missing / output prefix to path
single_output_dir = check.inputs['single_output_dir']

# ensure unarchive is set from CheckInputs.check_unarchive_set
unarchive = check.inputs['unarchive']

# time of running for naming output folders
start_time = time_stamp()

Expand Down Expand Up @@ -341,6 +375,9 @@ def main(
print("Parsed manifest(s):")
print('⠀⠀', '\n⠀⠀⠀'.join({f"{k}: {v}" for k, v in manifest.items()}))

# record what we had provided before excluding anything
provided_manifest_samples = manifest.keys()

# filter manifest tests against genepanels to ensure what has been
# requested are test codes or HGNC IDs we recognise
manifest = check_manifest_valid_test_codes(
Expand All @@ -360,6 +397,22 @@ def main(
for sample in manifest
}

# check up front if any files for any of the selected running modes
# are in an archived state which would cause jobs to fail to launch
DXManage().check_all_files_archival_state(
patterns=assay_config.get('mode_file_patterns'),
samples=manifest.keys(),
path=single_output_dir,
unarchive=unarchive,
unarchive_only=unarchive_only,
modes={
'cnv_reports': cnv_reports,
'snv_reports': snv_reports,
'mosaic_reports': mosaic_reports,
'artemis': artemis
}
)

launched_jobs = {}
cnv_report_errors = snv_report_errors = mosaic_report_errors = \
cnv_call_excluded_files = cnv_report_summary = snv_report_summary = \
Expand Down Expand Up @@ -514,6 +567,7 @@ def main(
app=app_details,
assay_config=assay_config,
manifest=manifest,
provided_manifest_samples=provided_manifest_samples,
launched_jobs=launched_jobs,
excluded=exclude_samples,
cnv_call_excluded=cnv_call_excluded_files,
Expand Down
65 changes: 65 additions & 0 deletions resources/home/dnanexus/dias_batch/tests/test_dias_batch.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import os
import sys
from unittest.mock import patch
import unittest


sys.path.append(os.path.abspath(
Expand Down Expand Up @@ -129,6 +130,39 @@ def test_check_no_mode_set(self, mocker):
'Error not raised for no running mode set'
)

def test_check_unarchive_behaviour_as_expected(self, mocker):
"""
Check behaviour for if unarchive_only is set that unarchive also
defaults to being set to True
"""
mocker.patch.object(CheckInputs, "__init__", return_value=None)
mocker.return_value = None

with unittest.TestCase().subTest('unarchive_only set to True'):
# Test when unarchive_only set to True we force unarchive
# to True also
check = CheckInputs()
check.inputs = {
'unarchive_only': True,
'unarchive': False
}
check.check_unarchive_set()

assert check.inputs['unarchive'] == True

with unittest.TestCase().subTest('unarchive_only set to False'):
# Test when unarchive_only set to False we do not force
# unarchive to True also
check = CheckInputs()
check.inputs = {
'unarchive_only': False,
'unarchive': False
}
check.check_unarchive_set()

assert check.inputs['unarchive'] == False


def test_error_raised_for_no_manifest_with_reports_mode(self, mocker):
"""
Test error is raised when a reports mode is set and no manifest given
Expand Down Expand Up @@ -258,6 +292,37 @@ def test_qc_status_file_is_valid(self, mock_file, mocker):
"Error not raised when non .xlsx file provided to check_qc_file()"
)

def test_string_inputs_with_whitespace_stripped(self, mocker):
"""
Test that string inputs are correctly stripped
"""
mocker.patch.object(CheckInputs, "__init__", return_value=None)
check = CheckInputs()

check.inputs = {
'assay': 'CEN ',
'assay_config_dir': ' some_dir',
'exclude_samples': ' sample1 ',
'manifest_subset': ' ',
'single_output_dir': '/output/foo/bar',
'cnv_call_job_id': 'job-xxx '
}

check.strip_string_inputs()

expected_inputs = {
'assay': 'CEN',
'assay_config_dir': 'some_dir',
'exclude_samples': 'sample1',
'manifest_subset': '',
'single_output_dir': '/output/foo/bar',
'cnv_call_job_id': 'job-xxx'
}

assert check.inputs == expected_inputs, (
'String inputs not correctly stripped'
)


class TestMain():
"""
Expand Down
Loading

0 comments on commit eea364e

Please sign in to comment.