Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow rule based operations on existing collections. #5819

Merged
merged 9 commits into from
Apr 25, 2018

Conversation

jmchilton
Copy link
Member

@jmchilton jmchilton commented Mar 30, 2018

Allow the rules DSL & GUI component (#5365) to operate on existing collections - this enables very flexible filtering, sorting, relabelling, grouping, flattening, and general re-organization of existing collections (e.g. the outputs of tools). This is implemented as a collection operation tool so that it should be executable interactively in the tool form and in a batch fashion as part of workflow executions.

For this to be tracked properly as a tool execution and to work properly in the tool form, I've implemented a new tool framework parameter type (including tool form integration) called "rules". Right now there is just one tool that uses this parameter type - but one can imagine other operations using the rule builder - e.g. creating collections from samplesheets as the first step in an automated workflow using the tool framework instead of the upload API.

This can thought of as a more GUI friendly alternative to some of my proposed collection operations (#2496 , #1313) in the past that consumed JavaScript expressions. This alternative approach is much easier to visualize for GUI based manipulation and is simpler than learning JavaScript if one does not know how to program, but the ubiquity of JavaScript would provide its owns benefits. Hopefully in the long run both approaches prove useful within the Galaxy ecosystem.

  • Outline the collection operation tool and parameter plumbing. Backend basics. (1 Day)
  • Implement simplest tool form parameter. (1 Day)
  • Connect tool form parameter to rule based builder component, including callback. (1 Day)
  • Simplest rule working in API to create flat and nested collections. (1 Days)
  • Simplest rule working GUI. (1/2 Day)
  • Preload collection identifiers as columns the first time the rules widget is loaded.
  • Selenium test case for no-op tool. (1 Day)
  • Flatten use case working API (1/2 Day)
  • Flatten use case - GUI + Selenium. (1/2 Day)
  • Swap the frontend to use Python-like regexes so these two approaches give the same results. (2 Days)
  • Develop specification framework for rules DSL that can be used to test frontend JS and backend Python.
  • Backend implementation of remaining rule types: (2 Days)
    • add_column_basename
    • add_column_rownum
    • add_column_value
    • add_column_metadata
    • add_column_regex
    • add_column_concatenate
    • add_column_substr
    • remove_columns
    • add_filter_regex
    • add_filter_count
    • add_filter_empty
    • add_filter_matches
    • add_filter_compare
    • sort
    • swap_columns
    • split_columns
  • Rework output collection type of this tool - it can be anything but it is determinable based on the content of the rules parameter - the workflow editor needs to be able to leverage this information. (2 Days)
    • Changing the collection type in the editor breaks the workflow connection - this is better than nothing but it'd be good if there was a warning or if it would attempt to re-establish the connection and alter the mapping if needed. (Added a async warning about this in the editor - obviously it'd be better if we had a more permanent warning mechanism for workflow changes but the UI framework isn't there yet.)
  • Testing rules in workflows in the API. 1 Day.
  • Testing rules editor in workflow editor in the GUI (selenium). 1 Day.
  • Conformance tests for the rules language and a way to run them against both the Python and ES6 implementations (attempt to get regex working same in JS and Python). This is a tricky aspect of doing this on the backend this way. 2 Days
  • Testing run rules tool in workflows in the GUI (selenium). 1 Day.
  • Prevent workflow runtime edit of this parameter.
  • Pretty validation on the backend and frontend. 1 Day.
  • Pretty display of rules in the...
    • Tool form.
    • Workflow summary.

xref #5381 (third bullet point)

@jmchilton jmchilton force-pushed the apply_rules branch 4 times, most recently from f301de2 to e46d6e1 Compare April 19, 2018 14:17
@jmchilton jmchilton changed the title [WIP] Allow rule based operations on existing collections. Allow rule based operations on existing collections. Apr 19, 2018
@galaxybot galaxybot added this to the 18.05 milestone Apr 19, 2018
@jmchilton jmchilton force-pushed the apply_rules branch 2 times, most recently from e35c473 to 7f64a99 Compare April 23, 2018 14:39
may be used in workflows as well where no such preview can be generated.

This tool is an advanced feature but has a lot of flexibility - it can be used to process collections
with arbitrary nesting and can do many be used to do many kinds of filtering, re-sorting, nesting,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/many be used to do// ?

@mvdbeek
Copy link
Member

mvdbeek commented Apr 24, 2018

This is very nice, I really like the flexibility this offers for reshaping collections. Certainly the flexible filtering will be very helpful for nested collections and workflows where there is a common part and a sample / condition specific part. This all works very nicely!

Copy link
Member

@martenson martenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR presents a powerful functionality that takes dataset hierarchy and history manipulation to the next level and I am eager to see what impressive things the community can do with it.

The thing I am a bit concerned about is the user experience since the interface is unlike any we use currently or that is widely adopted. Now we have it in three places in Galaxy.

I propose we focus on gathering feedback on this throughout the 18.05 release with the aim to improve user intuition of building and managing these rules.

Together with galaxyproject/training-material#676 this is already very useful and ready to be merged.

jmchilton and others added 7 commits April 24, 2018 11:08
Allow the rules DSL & GUI component to operate on existing collections to allow filtering, sorting, modifying identifiers and general re-organization of existing collections (e.g. the outputs of tools). Implementing this as a collection operation tool so that it should be executable interactively in the tool form and in a batch fashion as part of workflow executions.

For this to be tracked properly as a tool execution and to work properly in the tool form, I've implemented a new tool framework and tool form parameter type called "rules".

This can thought of as a more GUI friendly alternative to my proposed collection operations that consumed JavaScript expressions.

This includes API tests for both tool and workflow execution of the new tool as well as Selenium tests for tool form execution, workflow editor interactions, and workflow running.
An exception in generating a "text representation" of a None value was preventing the rest of to_dict from running properly for that parameter value. This restores the rule preview table in the tool form.
@martenson
Copy link
Member

conflicts because of #5969 merge - sorry 😭

@@ -713,7 +713,7 @@ const MAPPING_TARGETS = {
help: _l(
"If this is set, all rows with the same collection name will be joined into a collection and it is possible to create multiple collections at once."
),
modes: ["raw", "ftp"], // TODO: allow this in datasets mode & tool builder modes
modes: ["raw", "ftp", "datasets", "library_datasets"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you deleted this whole line in #5969, is this the proper merge?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes... thanks for looking so close though! Think of modes like a filter - in #5969 I made it so all the modes of the rule builder that produce collections could production multiple collections so I didn't need that any more. This PR adds the ability to apply to existing collections using a tool that only has one output - so I needed to re-list all the valid modes and exclude that one new mode ("collection_contents").

screen shot 2018-04-24 at 4 14 41 pm

@martenson martenson merged commit e715865 into galaxyproject:dev Apr 25, 2018
@nsoranzo nsoranzo deleted the apply_rules branch July 14, 2020 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants