Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[24.1] Fill in missing help for cross product tools. #18698

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 57 additions & 3 deletions lib/galaxy/tools/cross_product_flat.xml
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,66 @@
Synopsis
========

@CROSS_PRODUCT_INTRO@

====================
How to use this tool
====================

===========
Description
===========
@GALAXY_DOT_PRODUCT_SEMANTICS@

Running input lists through this tool produces new dataset lists (described in detail below) that when using
the same natural element-wise matching "map over" semantics described above produce every combination of the
elements of the two lists compared against each other. Running a tool with these two outputs instead of the inital
two input produces a list of the comparison of each combination of pairs from the respective inputs.

.. image:: ${static_path}/images/tools/collection_ops/flat_crossproduct_output.png
:alt: The Flat Cartesian Product of Two Collections
:width: 500

The result of running a subsequent tool with the outputs produced by this tool will be a much larger list
whose element identifiers are the concatenation of the combinations of the elements identifiers from the
two input lists.

.. image:: ${static_path}/images/tools/collection_ops/flat_crossproduct_separator.png
:alt: Flat Cross Product Identifier Separator
:width: 500

============================================
What this tool does (technical details)
============================================

This tool consumes two lists - we will call them ``input_a`` and ``input_b``. If ``input_a``
has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
produces a pair of larger lists - each of size ``n*m``.

Both output lists will be the same length and contain the same set of element identifiers in the
same order. If the kth input can be described as ``(i-1)*n + (j-1)`` where ``1 <= i <= m`` and ``1 <= j <= n``
then the element identifier for this kth element is the concatenation of the element identifier for
the ith item of ``input_a`` and the jth item of ``input_b``.

In the first output list, this kth element will be the ith element of ``input_a``. In the second
output list, the kth element will be the jth element of ``input_b``.

.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_outputs.png
:alt: Flat Cross Product Outputs
:width: 500

These list structures might appear to be a little odd, but they have the very useful property
that if you match up corresponding elements of the lists the result is each combination of
elements in ``input_a`` and ``input_b`` are matched up once.

.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_matched.png
:alt: Flat Cross Product Matching Datasets
:width: 500

Running a downstream comparison tool that compares two datasets with these two lists produces a
new list with every combination of comparisons.

.. image:: ${static_path}/images/tools/collection_ops/flat_cross_product_downstream.png
:alt: Flat Cross Product All-vs-All Result
:width: 500

----

Expand Down
63 changes: 60 additions & 3 deletions lib/galaxy/tools/cross_product_nested.xml
Original file line number Diff line number Diff line change
Expand Up @@ -76,12 +76,69 @@
Synopsis
========

@CROSS_PRODUCT_INTRO@

====================
How to use this tool
====================

===========
Description
===========
@GALAXY_DOT_PRODUCT_SEMANTICS@

Running input lists through this tool produces new list structures (described in detail below) that when using
the same natural element-wise matching "map over" semantics described above produce every combination of the
elements of the two lists compared against each other. Running a tool with these two outputs instead of the inital
two input produces a nested list structure where the jth element of the inner list of the ith element of the outer
list is a comparison of the ith element of the first list to the jth element of the second list.
Put more simply, the result is a nested list where the identifiers of an element describe which inputs were
matched to produce the comparison output found at that element.

.. image:: ${static_path}/images/tools/collection_ops/nested_crossproduct_output.png
:alt: The Cartesian Product of Two Collections
:width: 500

============================================
What this tool does (technical details)
============================================

This tool consumes two flat lists. We will call the input collections ``input_a`` and ``input_b``. If ``input_a``
has length ``n`` and dataset elements identified as ``a1``, ``a2``, ... ``an`` and ``input_b``
has length ``m`` and dataset elements identified as ``b1``, ``b2``, ... ``bm``, then this tool
produces a pair of output nested lists (specifically of the ``list:list`` collection type) where
the outer list is of length ``n`` and each inner list has a length of ``m`` (a ``n X m`` nested list). The jth element
inside the outer list's ith element is a pseudo copy of the ith dataset of ``inputa``. One
way to think about the output nested lists is as matrices. Here is a diagram of the first output
showing the element identifiers of the outer and inner lists along with the what dataset is being
"copied" into this new collection.

.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_1.png
:alt: Nested Cross Product First Output
:width: 500

The second output is a nested list of pseudo copies of the elements of ``input_b`` instead of
``input_a``. In particular the outer list is again of length ``n`` and each inner list is again
of lenth ``m`` but this time the jth element inside the outer list's ith element is a pseudo copy
of the jth dataset of ``inputb``. Here is the matrix of these outputs.

.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_out_2.png
:alt: Nested Cross Product Second Output
:width: 500

These nested list structures might appear to be a little odd, but they have the very useful property
that if you match up corresponding elements of the nested lists the result is each combination of
elements in ``input_a`` and ``input_b`` are matched up once. The following diagram describes these matching
datasets.

.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_matching.png
:alt: Matching Inputs
:width: 500

Running a tool that compares two datasets with these two nested lists produces a new nested list
as described above. The following diagram shows the structure of this output and how the element
identifiers are preserved and indicate what comparison was performed.

.. image:: ${static_path}/images/tools/collection_ops/nested_cross_product_output.png
:alt: Matching Inputs
:width: 500

----

Expand Down
32 changes: 32 additions & 0 deletions lib/galaxy/tools/model_operation_macros.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,38 @@
class="ModelOperationToolAction"/>
</xml>
<token name="@QUOTA_USAGE_NOTE@">This tool will create new history datasets copied from your input collections but your quota usage will not increase.</token>
<token name="@CROSS_PRODUCT_INTRO@"><![CDATA[
This tool organizes two dataset lists so that Galaxy's normal collection processing produces
an all-vs-all style analyses of the initial inputs when applied to the outputs of this tool.

While a description of what it does standalone is technical and math heavy, how
it works within an ad-hoc analysis or workflow can be quite straight forward and hopefully is easier
to understand. For this reason, the next section describes how to use this tool in context and
the technical details follow after that. Hopefully, the "how it works" details aren't nessecary to
understand the "how to use it" details of this tool - at least for simple things.
]]>
</token>
<token name="@GALAXY_DOT_PRODUCT_SEMANTICS@"><![CDATA[

This tool can be used in and out of workflows, but workflows will be used to illustrate the ordering of
tools and connections between them. Imagine a tool that compares two individual datasets and how
that might be connected to list inputs in a workflow. This simiple case is shown below:

.. image:: ${static_path}/images/tools/collection_ops/dot_product.png
:alt: The Dot Product of Two Collections
:width: 500

In this configuration - the two datasets will be matched and compared element-wise. So the first dataset
of "Input List 1" will be compared to the first dataset in "Input List 2" and the resulting
dataset will be the first dataset in the output list generated using this comparison tool. In this configuration
the lists need to have the same number of elements and ideally matching element identifiers.

This matching up of elements is a very natural way to "map" an operation (or in Galaxy parlance, a tool)
over two lists. However, sometimes the desire is to compare each element of the first list to each element of the
Comment on lines +33 to +34
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This matching up of elements is a very natural way to "map" an operation (or in Galaxy parlance, a tool)
over two lists. However, sometimes the desire is to compare each element of the first list to each element of the
This matching up of elements is a very natural way to perform a "map over" operation, which in Galaxy means to map an operation over the elements of multiple lists. However, sometimes the desire is to compare each element of the first list to each element of the

I think I would want to establish the term "map over" (and quote it) since that's what (I think) we use when we're talking about these operations. It has probably entered the conversation in chats and the help forum at this point.

Feel free to dismiss this though, it's great that we finally put it into writing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have opened a PR that I think would enable us to do this well in dev #18722.

second list. This tool enables that.

]]></token>

<xml name="annotate_as_aggregation_operation">
<edam_operations>
<edam_operation>operation_3436</edam_operation> <!-- DataHandling -> Aggregation -->
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading