Skip to content

Commit

Permalink
Merge pull request #17422 from wm75/data_source_tools_fixes
Browse files Browse the repository at this point in the history
Fix data_source and data_source_async bugs
  • Loading branch information
mvdbeek authored Feb 15, 2024
2 parents a41b578 + 6ee7663 commit 9415df8
Show file tree
Hide file tree
Showing 31 changed files with 144 additions and 121 deletions.
22 changes: 16 additions & 6 deletions lib/galaxy/tool_util/xsd/galaxy.xsd
Original file line number Diff line number Diff line change
Expand Up @@ -58,12 +58,17 @@ List of behavior changes associated with profile versions:
### 21.09
- Do not strip leading and trailing whitespaces in `from_work_dir` attribute.
- Do not use Galaxy Python virtual environment for `data_source` tools. `data_source` tools should explicitly use the `galaxy-util` package.
### 23.0
- Text parameters that are inferred to be optional (i.e the `optional` tag is not set, but the tool parameter accepts an empty string)
are set to `None` for templating in Cheetah. Older tools receive the empty string `""` as the templated value.
### 24.0
- Do not use Galaxy python environment for `data_source_async` tools.
### Examples
A normal tool:
Expand Down Expand Up @@ -265,9 +270,11 @@ this tool is usable within a workflow (defaults to ``true`` for normal tools and
</xs:attribute>
<xs:attribute name="URL_method" type="URLmethodType">
<xs:annotation>
<xs:documentation xml:lang="en">Only used if ``tool_type`` attribute value
is ``data_source`` or ``data_source_async`` - this attribute defines the HTTP request method to use when
communicating with an external data source application (the default is ``get``).</xs:documentation>
<xs:documentation xml:lang="en">*Deprecated* and ignored,
use a [request_param](#tool-request-param-translation-request-param) element with ``galaxy_name="URL_method"`` instead.
Was only used if ``tool_type`` attribute value is ``data_source`` or ``data_source_async`` -
this attribute defined the HTTP request method to use when communicating with an external data source application
(default: ``get``).</xs:documentation>
</xs:annotation>
</xs:attribute>
</xs:complexType>
Expand Down Expand Up @@ -1721,7 +1728,7 @@ useful for non-deterministic output.
<xs:documentation xml:lang="en"><![CDATA[
If specified, the target output's MD5 hash should match the value specified
here. For large static files it may be inconvenient to upload the entiry file
here. For large static files it may be inconvenient to upload the entire file
and this can be used instead.
]]></xs:documentation>
Expand All @@ -1734,7 +1741,7 @@ and this can be used instead.
If specified, the target output's checksum should match the value specified
here. This value should have the form ``hash_type$hash_value``
(e.g. ``sha1$8156d7ca0f46ed7abac98f82e36cfaddb2aca041``). For large static files
it may be inconvenient to upload the entiry file and this can be used instead.
it may be inconvenient to upload the entire file and this can be used instead.
]]></xs:documentation>
</xs:annotation>
Expand Down Expand Up @@ -2852,7 +2859,9 @@ tools will not need to specify any attributes on this tag itself.]]>
</xs:attribute>
<xs:attribute name="method" type="URLmethodType">
<xs:annotation>
<xs:documentation xml:lang="en">Data source HTTP action (e.g. ``get`` or ``put``) to use.</xs:documentation>
<xs:documentation xml:lang="en">*Deprecated* and ignored,
use a [request_param](#tool-request-param-translation-request-param) element with ``galaxy_name="URL_method"`` instead.
Data source HTTP action (e.g. ``get`` or ``put``) to use.</xs:documentation>
</xs:annotation>
</xs:attribute>
<xs:attribute name="target" type="TargetType">
Expand Down Expand Up @@ -7251,6 +7260,7 @@ and ``bibtex`` are the only supported options.</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:enumeration value="data_source"/>
<xs:enumeration value="data_source_async"/>
<xs:enumeration value="manage_data"/>
<xs:enumeration value="interactive"/>
<xs:enumeration value="expression"/>
Expand Down
5 changes: 4 additions & 1 deletion lib/galaxy/tools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -910,7 +910,7 @@ def requires_galaxy_python_environment(self):
# seem to require Galaxy's Python.
# FIXME: the (instantiated) tool class should emit this behavior, and not
# use inspection by string check
if self.tool_type not in ["default", "manage_data", "interactive", "data_source"]:
if self.tool_type not in ["default", "manage_data", "interactive", "data_source", "data_source_async"]:
return True

if self.tool_type == "manage_data" and self.profile < 18.09:
Expand All @@ -919,6 +919,9 @@ def requires_galaxy_python_environment(self):
if self.tool_type == "data_source" and self.profile < 21.09:
return True

if self.tool_type == "data_source_async" and self.profile < 24.0:
return True

config = self.app.config
preserve_python_environment = config.preserve_python_environment
if preserve_python_environment == "always":
Expand Down
72 changes: 57 additions & 15 deletions lib/galaxy/webapps/galaxy/controllers/async.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
"""
Upload class
Controller to handle communication of tools of type data_source_async
"""

import logging
Expand Down Expand Up @@ -39,8 +39,6 @@ def index(self, trans, tool_id=None, data_secret=None, **kwd):
return trans.response.send_redirect("/index")

params = Params(kwd, sanitize=False)
STATUS = params.STATUS
URL = params.URL
data_id = params.data_id

log.debug(f"async dataid -> {data_id}")
Expand All @@ -52,30 +50,62 @@ def index(self, trans, tool_id=None, data_secret=None, **kwd):
if not tool:
return f"Tool with id {tool_id} not found"

#
# we have an incoming data_id
#
if data_id:
if not URL:
return f"No URL parameter was submitted for data {data_id}"
#
# we have an incoming data_id
#
data = trans.sa_session.query(trans.model.HistoryDatasetAssociation).get(data_id)

if not data:
return f"Data {data_id} does not exist or has already been deleted"
if data.state in data.dataset.terminal_states:
log.debug(f"Tool {tool.id}: execution stopped as data {data_id} has entered terminal state prematurely")
trans.log_event(
f"Tool {tool.id}: execution stopped as data {data_id} has entered terminal state prematurely"
)
return f"Data {data_id} has finished processing before job could be completed"

# map params from the tool's <request_param_translation> section;
# ignore any other params that may have been passed by the remote
# server with the exception of STATUS and URL;
# if name, info, dbkey and data_type are not handled via incoming params,
# use the metadata from the already existing dataset;
# preserve original params under nested dict
params_dict = dict(
STATUS=params.STATUS,
URL=params.URL,
name=data.name,
info=data.info,
dbkey=data.dbkey,
data_type=data.ext,
incoming_request_params=params.__dict__.copy(),
)
if tool.input_translator:
tool.input_translator.translate(params)
tool_declared_params = {
translator.galaxy_name for translator in tool.input_translator.param_trans_dict.values()
}
for param in params:
if param in tool_declared_params:
params_dict[param] = params.get(param, None)
params = params_dict

if not params.get("URL"):
return f"No URL parameter was submitted for data {data_id}"

STATUS = params.get("STATUS")

if STATUS == "OK":
key = hmac_new(trans.app.config.tool_secret, "%d:%d" % (data.id, data.history_id))
if key != data_secret:
return f"You do not have permission to alter data {data_id}."
if not params.get("GALAXY_URL"):
# provide a fallback for GALAXY_URL
params["GALAXY_URL"] = f"{trans.request.url_path}/async/{tool_id}/{data.id}/{key}"
# push the job into the queue
data.state = data.blurb = data.states.RUNNING
log.debug(f"executing tool {tool.id}")
trans.log_event(f"Async executing tool {tool.id}", tool_id=tool.id)
galaxy_url = f"{trans.request.url_path}/async/{tool_id}/{data.id}/{key}"
galaxy_url = params.get("GALAXY_URL", galaxy_url)
params = dict(
URL=URL, GALAXY_URL=galaxy_url, name=data.name, info=data.info, dbkey=data.dbkey, data_type=data.ext
)

# Assume there is exactly one output file possible
TOOL_OUTPUT_TYPE = None
Expand Down Expand Up @@ -104,9 +134,21 @@ def index(self, trans, tool_id=None, data_secret=None, **kwd):

return f"Data {data_id} with status {STATUS} received. OK"
else:
#
# no data_id must be parameter submission
#
# create new dataset, put it into running state,
# send request for data to remote server and see if the response
# ends in ok;
# the request that's getting sent goes to the URL found in
# params.URL or, in its absence, to the one found as the value of
# the "action" attribute of the data source tool's "inputs" tag.
# Included in the request are the parameters:
# - data_id, which indicates to the remote server that Galaxy is
# ready to accept data
# - GALAXY_URL, which takes the form:
# {base_url}/async/{tool_id}/{data_id}/{data_secret}, and which
# when used by the remote server to send a data download link,
# will trigger the if branch above.
GALAXY_TYPE = None
if params.data_type:
GALAXY_TYPE = params.data_type
Expand Down Expand Up @@ -171,7 +213,7 @@ def index(self, trans, tool_id=None, data_secret=None, **kwd):
params.update({"data_id": data.id})

# Use provided URL or fallback to tool action
url = URL or tool.action
url = params.URL or tool.action
# Does url already have query params?
if "?" in url:
url_join_char = "&"
Expand Down
7 changes: 6 additions & 1 deletion lib/galaxy/webapps/galaxy/controllers/tool_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,13 @@ def __tool_404__():
if tool.tool_type in ["default", "interactivetool"]:
return trans.response.send_redirect(url_for(controller="root", tool_id=tool_id))

# execute tool without displaying form (used for datasource tools)
# execute tool without displaying form
# (used for datasource tools, but note that data_source_async tools
# are handled separately by the async controller)
params = galaxy.util.Params(kwd, sanitize=False)
if tool.tool_type == "data_source":
# preserve original params sent by the remote server as extra dict
params.update({"incoming_request_params": params.__dict__.copy()})
# do param translation here, used by datasource tools
if tool.input_translator:
tool.input_translator.translate(params)
Expand Down
4 changes: 2 additions & 2 deletions test/functional/tools/test_data_source.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
the initial response. If value of 'URL_method' is 'post', any additional params coming back in the
initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
-->
<tool name="test_data_source" id="test_data_source" tool_type="data_source" version="1.0.0">
<tool name="test_data_source" id="test_data_source" tool_type="data_source" version="1.0.0" profile="20.09">
<command><![CDATA[
python '$__tool_directory__/data_source.py' '$output' $__app__.config.output_size_limit
]]></command>
<inputs action="http://ratmine.mcw.edu/ratmine/begin.do" check_values="false" method="get">
<inputs action="http://ratmine.mcw.edu/ratmine/begin.do" check_values="false" method="get">
<display>go to Ratmine server $GALAXY_URL</display>
<param name="GALAXY_URL" type="baseurl" value="/tool_runner?tool_id=ratmine" />
</inputs>
Expand Down
2 changes: 1 addition & 1 deletion tools/data_source/biomart.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
TODO: Hack to get biomart to work - the 'add_to_URL' param can be eliminated when the Biomart team encodes URL prior to sending, meanwhile
everything including and beyond the first '&' is truncated from URL. They said they'll let us know when this is fixed at their end.
-->
<tool name="BioMart" id="biomart" tool_type="data_source" version="1.0.1">
<tool name="BioMart" id="biomart" tool_type="data_source" version="1.0.1" profile="20.09">
<description>Ensembl server</description>
<edam_operations>
<edam_operation>operation_0224</edam_operation>
Expand Down
2 changes: 1 addition & 1 deletion tools/data_source/biomart_test.xml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
TODO: Hack to get biomart to work - the 'add_to_URL' param can be eliminated when the Biomart team encodes URL prior to sending, meanwhile
everything including and beyond the first '&' is truncated from URL. They said they'll let us know when this is fixed at their end.
-->
<tool name="BioMart" id="biomart_test" tool_type="data_source" version="1.0.1">
<tool name="BioMart" id="biomart_test" tool_type="data_source" version="1.0.1" profile="20.09">
<description>Test server</description>
<edam_operations>
<edam_operation>operation_0224</edam_operation>
Expand Down
2 changes: 1 addition & 1 deletion tools/data_source/cbi_rice_mart.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
the initial response. If value of 'URL_method' is 'post', any additional params coming back in the
initial response ( in addition to 'URL' ) will be encoded and appended to URL and a post will be performed.
-->
<tool name="CBI Rice Mart" id="cbi_rice_mart" tool_type="data_source" version="1.0.1">
<tool name="CBI Rice Mart" id="cbi_rice_mart" tool_type="data_source" version="1.0.1" profile="20.09">
<description>rice mart</description>
<edam_operations>
<edam_operation>operation_0224</edam_operation>
Expand Down
Loading

0 comments on commit 9415df8

Please sign in to comment.