forked from galaxyproject/galaxy
-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a subset of the CWL Draft 3 tool format. #1
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This will be rebased frequently but this standing PR should represent the current progress of the cwl branch. Any help would be appreciated, will work on building a TODO list of what needs to be done. |
jmchilton
force-pushed
the
cwl
branch
5 times, most recently
from
October 22, 2015 18:00
1315239
to
3bd8aad
Compare
mr-c
added a commit
that referenced
this pull request
Oct 22, 2015
Revert most of the 79-char linewrapping in objectstore
jmchilton
force-pushed
the
cwl
branch
2 times, most recently
from
November 2, 2015 22:20
21666c9
to
6d66270
Compare
jmchilton
pushed a commit
that referenced
this pull request
Nov 2, 2015
Remove old XSD linting code.
jmchilton
force-pushed
the
cwl
branch
3 times, most recently
from
November 6, 2015 04:50
c2f64ae
to
22f4234
Compare
jmchilton
force-pushed
the
cwl
branch
6 times, most recently
from
November 16, 2015 01:28
714442e
to
8706584
Compare
jmchilton
force-pushed
the
cwl
branch
10 times, most recently
from
December 2, 2015 03:41
6c814b2
to
9befc72
Compare
This special class of tools leverages the infrastructure for tool inputs, tool state tracking, tool module for workflows, tool API, etc... without actually producing command-line jobs. Instead these tools are provided the input model objects and are expected to produce output model objects directly. This provides an oppertunity The first driving use case for these tools are also included - namely tools that allow zipping and unzipping paired collections. These tools can be mapped over lists (e.g. list:paired to (list, list) or the inverse) using much of the existing infrastructure for tools. Test cases included that validate these work with mapping operations and in workflows. The most obvious advantage of these versus traditional tools that do the same thing is that the data isn't copied on disk - new HDAs are created directly from the source datasets. Testing: This PR includes various API test cases for functionality, these can be run with the following command: ``` ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_unzip_collection ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_zip_inputs ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_zip_list_inputs ./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_workflow_run_zip_collections ```
Based on work by Peter Amstutz in cwltool (https://github.com/common-workflow-language/cwltool).
This differs from a traditional tool in that its inputs don't need to be in an 'ok' state and instead of creating new datasets and duplicating data on disk, new HDAs are created from the existing datasets.
Testing: ``` ./run_tests.sh -framework -id __FLATTEN__ ```
The user is prompted for a JavaScript expression, which is in turn ran once per dataset in a list and used as filter. If the JavaScript evaluates to a Python truthy value, the HDA is copied into the output dataset (without duplicating the data on disk). The JavaScript expression is supplied various HDA attributes in the environment (currently all metadata values, file_size, file_ext, and dbkey). The supplied test case filters out datasets that do not contain an even number of lines. Testing: ``` ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_filter_0 ```
... for loading tool actions that require it.
Takes in a list dataset collection and produces a list of lists keying the outer list on a user supplied function. This reuses the JavaScript expression code used by the filter model tool. Testing: ``` ./run_tests.sh -framework -id __GROUP__ ```
- Introduce models and a API for creating tools dynamically. - Use Galaxy's testing-only YAML based representation of tools to prototype this. - Extend Format 2 workflow definitions to allow embedding tools directly into workflows, either directly or using a CWL-style @import syntax. Testing: Test cases demonstrating tools can be imported (only by admins) and are runnable are included with this commit. More test cases regarding workflow use of dynamic tools and Format 2 workflow definition extensions are also included. These tests can be run with the following commands: ``` ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_nonadmin_users_cannot_create_tools ./run_tests.sh -api test/api/test_tools.py:ToolsTestCase.test_dynamic_tool_1 ./run_tests.sh -api test/api/test_workflows.py:WorkflowsApiTestCase.test_import_export_dynamic ./run_tests.sh -api test/api/test_workflows_from_yaml.py:WorkflowsFromYamlApiTestCase.test_workflow_embed_tool ./run_tests.sh -api test/api/test_workflows_from_yaml.py:WorkflowsFromYamlApiTestCase.test_workflow_import_tool ```
- Tool definition languge and plumbing and datatype for expressing expressions as jobs. - Allow connecting expression tools to parameters in workflows, will delay evaluation of workflow so calculated value - Example test expression tools for testing and demonstration. - [WIP] Workflow expression module to allow users to specify arbitrary expressions.
CWL Support: -------------- - Implemented integer, long, float, double, boolean, and File parameters, and arrays thereof as well some simple unions of these parameters and Any-type parameters. More complex unions of datatypes are stil unsupported. - Draft 3 ``CreateFileRequirement``s are supported (see the ``test_rename`` test case). - Draft 3 ``InlineJavascriptRequirement`` are support to define output files (see ``test_cat3`` test case). - ``EnvVarRequirement``s are supported (see the ``test_env_tool1`` and ``test_env_tool2`` test cases). - Secondary files are supported at least partially, see the ``index1`` and ``showindex1`` CWL tools created to verify this as well as the ``test_index1`` test case. - Docker integration is only partial (simple docker pull is supported) - so ``cat3-tool.cwl`` works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue galaxyproject#1684. - Expression tools are supported (see ``parseInt-tool`` test case). - Non-File CWL outputs are represented as ``expression.json`` files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs #27. Implementation Notes: ---------------------- - CWL secondary files are stored in ``__secondary_files__`` directory in the dataset's extra_files_path directory. - The tool execution API has been extended to add a ``inputs_representation`` parameter that can be set to "cwl" now. The ``cwl`` representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class. - Since the CWL <-> Galaxy parameter translation may change over time, for instance if Galaxy develops or refines parameter classes - CWL state and CWL state version is tracked in the database and hopefully for reruns, etc... we could update the Galaxy state from an older version to a new one. - CWL allows output parameters to be either ``File`` or non-``File`` and determined at runtime, so ``galaxy.json`` is used to dynamically adjust output extension as needed for non-``File`` parameters. Implementation Description: ----------------------------- The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with ``.json`` or ``.cwl`` and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool. When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object. As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc.... Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs. Currently all ``File`` outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done. 1) CWL should support EDAM declaration of types and Galaxy should provide a mapping to core datasets to skip sniffing is types are found. 2) For finer grain control within Galaxy, extensions to CWL should allow setting actual Galaxy output types on outputs. (Distinction between fastq and fastqsanger in Galaxy is very important for instance.) Testing: --------------------- % git clone https://github.com/common-workflow-language/galaxy.git % git checkout cwl % cd galaxy % virtualenv .venv % . .venv/bin/activate % pip install cwltool Start Galaxy. % GALAXY_RUN_WITH_TEST_TOOLS=1 run.sh --reload Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel. To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to ``config/job_conf.xml``. (Adjust the ``docker_sudo`` parameter based on how you execute Docker). https://gist.github.com/jmchilton/3997fa471d1b4c556966 Run API tests demonstrating the various CWL demo tools with the following command. ``` ./run_tests.sh -api test/api/test_tools_cwl.py ``` Issues --------------------------------- Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues. Refactor toward workflow support.
mr-c
changed the title
Implementat a subset of the CWL Draft 3 tool format.
Implement a subset of the CWL Draft 3 tool format.
Sep 30, 2016
jmchilton
pushed a commit
that referenced
this pull request
Mar 6, 2017
Add fastq(*).bz2 datatypes and converters
This PR has be superseded by #47 - work will now be tracked in the cwl-1.0 branch. |
jmchilton
pushed a commit
that referenced
this pull request
Jul 13, 2017
jmchilton
pushed a commit
that referenced
this pull request
Aug 14, 2017
Swap two more spots initializing TagManagers to new session
jmchilton
pushed a commit
that referenced
this pull request
Jan 14, 2018
Small javascript fixes for askomics IE integration
jmchilton
pushed a commit
that referenced
this pull request
Apr 20, 2018
Update from Galaxyproject repo
jmchilton
pushed a commit
that referenced
this pull request
Jul 1, 2018
Need to set default HOME/TMP before env_setup_commands
jmchilton
pushed a commit
that referenced
this pull request
Mar 16, 2019
Add tool test for metadata_in_range
nsoranzo
pushed a commit
that referenced
this pull request
Dec 29, 2020
Add select and <filter> example
nsoranzo
pushed a commit
that referenced
this pull request
Jan 19, 2021
Pull John's updates into my branch
nsoranzo
pushed a commit
that referenced
this pull request
Apr 2, 2024
nsoranzo
pushed a commit
that referenced
this pull request
Dec 5, 2024
Change JobState Retry-Strategy
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CWL Support:
["null", <simple_type>]
union parameters and Any-type parameters. More complex unions of datatypes are stil unsupported (unions of two or more non-null parameters, unions of["null", Any]
, etc...).CreateFileRequirement
s are supported (see thetest_rename
test case).InlineJavascriptRequirement
are support to define output files (seetest_cat3
test case).EnvVarRequirement
s are supported (see thetest_env_tool1
andtest_env_tool2
test cases).index1
andshowindex1
CWL tools created to verify this as well as thetest_index1
test case.cat3-tool.cwl
works for example. Full semantics of CWL docker support has yet to be implemented. The remaining work is straight-forward and trackd in the meta-issue More Refined Docker Support for Tools galaxyproject/galaxy#1684.parseInt-tool
test case).expression.json
files. Traditionally Galaxy hasn't supported non-File outputs from tools but CWL Galaxy has work in progress on bringing native Galaxy support for such outputs Add Expression Tools to Galaxy #27.Implementation Notes:
__secondary_files__
directory in the dataset's extra_files_path directory.inputs_representation
parameter that can be set to "cwl" now. Thecwl
representation for running tools corresonding to the CWL job json format with {class: "File: path: "/path/to/file"} inputs replaced with {"src": "hda", "id": "<dataset_id>"}. Code for building these requests for CWL job json is available in the test class.File
or non-File
and determined at runtime, sogalaxy.json
is used to dynamically adjust output extension as needed for non-File
parameters.Implementation Description:
The reference implementation Python library (mainly developed by Peter Amstutz - https://github.com/common-workflow-language/common-workflow-language/tree/master/reference) is used to load tool files ending with
.json
or.cwl
and proxy objects are created to adapt these tools to Galaxy representations. In particular input and output descriptions are loaded from the tool.When the tool is submitted, a special specialized tool class is used to build a cwltool compatible job description from the supplied Galaxy inputs and the CWL reference implementation is used to generate a CWL reference implementation Job object. A command-line is generated from this Job object.
As a result of this - Galaxy largely does not need to worry about the details of command-line adapters, expressions, etc....
Galaxy writes a description of the CWL job that it can reload to the job working directory. After the process is complete (on the Galaxy compute server, but outside the Docker container) this representation is reloaded and the dynamic outputs are discovered and moved to fixed locations as expected by Galaxy. CWL allows for much more expressive output locations than Galaxy, for better or worse, and this step uses cwltool to adapt CWL to Galaxy outputs.
Currently all
File
outputs are sniffed to determined a Galaxy datatype, CWL draft 3 allows refinement on this and this remains work to be done.Testing:
Start Galaxy.
Open http://localhost:8080/ and see CWL test tools (along with all Galaxy test tools) in left hand tool panel.
To go a step further and actually run CWL jobs within their designated Docker containers, copy the following minimal Galaxy job configuration file to
config/job_conf.xml
. (Adjust thedocker_sudo
parameter based on how you execute Docker).https://gist.github.com/jmchilton/3997fa471d1b4c556966
Run API tests demonstrating the various CWL demo tools with the following command.
Issues
Work remaining on CWL support for Galaxy is tracked at https://github.com/common-workflow-language/galaxy/issues.