Dynamic Models for Tool Test Validation #18679

jmchilton · 2024-08-10T23:16:19Z

Profile 24.2 or newer tools will not allow a test case to load at all if that test case does not validate.
Profile 24.2 or newer tools disable unqualified access to conditional and repeat parameters - these lead to ambiguities. Tool developers should be using <conditional> and <repeat> tags to avoid ambiguity or using the fully qualified state path (see @mvdbeek's first comment below).
Profile 24.2 or newer tools are more strongly typed - select parameters must be specified in test parameter specification by option value and not option display, boolean parameters must be specified as booleans and not using the truevalue / falsevalue, and column parameters must be specified as integers.

I imagine everyone is vaguely on board for validating the test cases immediately. It will catch many class of tool test problems immediately and unquestionably speed up new tool development. I think what I've declared best practices and am validating vs. not validating is probably a lot more contentious. I will setup strawman arguments here and argue against them but I will also try to be open to changing things around if people think my path on the particulars is wrong.

In my head, there is going to be a pragmatic camp led by IUC folks (maybe @bgruening ?) that have been burnt by ever increasing linting stringency making it harder to upgrade older existing tools. I would argue that the increased type safety and simplicity will reduce potential problems with tool and make tools more uniform and safer to reason about. I think it is also helpful to simplify the tool chain and helpful in training developers how they should use their tools with the API ideally. The benefits are worthy of a little disruption vs. some more picky linting we've done in the past. I've also included utilities to help developers with the migration (see Command-line Validation below).

In my head, the other side is going to be the purist camp led by @mvdbeek that would like even stricter validation - further migration toward CWL purity. For instance, we can ensure we're loading integers right from the tool source using value_json instead of value. @mvdbeek and I had discussed making this the best practice at some point. Working through this project - I've come around on this though. By doing the validation and ensuring that each parameter type (or combination of parameter type and whether it is multiple or not), there is not ambiguity any more. No reason to think <parameter name="threshold" value="7"> might be a string after this PR - we now know it is an integer when validating the generated API request for the test. And all that logic is in galaxy-tool-util and available without a Galaxy runtime. There is a flag in the code to enable this requirement - at least at the warning level - but I don't think it should be switched on.

For hundreds of real world examples to evaluate these models against - the tool test validation applied to the IUC looks like this https://gist.github.com/jmchilton/e84f91cf2646057e5cd25781ba7d422b.

Command-line Validation

Ideally, this will land up in Planemo after we've worked through the right path forward on best practices and what should validate and what shouldn't. But I have still included a little helper utility to check existing tools. This can be used to generate JSON details about validation errors (and some classes of warnings for older tools) on the command-line as either plain text or as JSON. The json generation might be useful for IDE helpers for instance.

galaxy-tool-test-case-validate [path_to_tool_or_directory_of_tools]

Run validation on the test cases and collect validation failures and type related warnings.

galaxy-tool-test-case-validate --latest [path_to_tool_or_directory_of_tools]

Run validation on the test cases and collect validation failures and type related warnings. Use 24.2 profile to do this and ignore the tool's profile. Gives some indication on whether the tool's test cases would be a blocker to upgrading the tool to the latest profile.

galaxy-tool-test-case-validate --json [path_to_tool_or_directory_of_tools]

Run validation but generate JSON - for CLI tooling and non-Python/non-Planemo tooling chains (IDE support).

I've run the tool on the IUC repository to produce this report https://gist.github.com/jmchilton/e84f91cf2646057e5cd25781ba7d422b.

Limitations and Future Work

All of this is kind of... bleh in terms of implementation and utility because the thing being validated isn't the developer's tool source but instead a weird json dictionary used by Galaxy to communicate the tool test to the client tooling. For this reason I've switched the name of the parameter model representation from test_case to test_case_xml. I don't think there is anything wrong with validating things at this level but ideally, we would be actually validating the source file contents also.

My next steps toward that end are to implementation a model representation - maybe called test_case_source that would be similar to this but targeting the tool source. It would be future facing and do things like replace comma separated lists with actual JSON lists. Then we would use that dynamic model to either:

Generate an XSD on the fly for the XML tool test. -or-
Implement an alternative tool syntax for specifying tool tests mirrors the syntax used by Planemo testings for workflows (https://planemo.readthedocs.io/en/latest/test_format.html). (my preference)

<tool>
  <inputs>...</inputs>
  <outputs>...</outputs>
  ...
  <tests><![CDATA[
- doc: this is my test case
  job:
     input:
       value: 1.bed
       class: File
     options:
       - opt1
       - opt2
       - opt4
   outputs:
     my_output:
       checksum: "sha1$a0b65939670bc2c010f4d5d6a0b3e4e4590fb92b"
]]></tests>
</tool>

This would allow line number based issues and since we can easily generate jsonschema for these dynamic models - there are many ways a tool chain could validate this part of the tool regardless of language and runtime.

A note about included workflow models.

I started on the workflow state work simultaneously but the test case work advanced more rapidly given that it is easier and doesn't require a tool shed upgrade. I have stripped out all of the validation and conversion plumbing (it is its own commit in #17393 now) but there is still a bunch of validation tests setting up a shot at #18536 in this PR and I can't really decouple them from the test case work. I would prefer we defer any conversation about those test cases until I open a PR for the workflow state validation stuff. I promise to foreground all the examples and be very open to having them picked apart at that time if we can stay focused on the tool test stuff here? Is that okay?

How to test the changes?

(Select all options that apply)

I've included appropriate automated tests.
This is a refactoring of components with existing test coverage.

License

I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

mvdbeek · 2024-08-11T08:22:12Z

I love it, and I think realistically the alternative tool syntax will allow more precise validation.
Do you think it is possible to provide a utility that can rewrite the test cases to the new syntax ?

One comment on the validation report: https://gist.github.com/jmchilton/e84f91cf2646057e5cd25781ba7d422b#file-gistfile1-txt-L76:

Test Case 8: validation failed - Invalid parameter name found tables_0|input_opts|linefilters_0|filter|filter_type

that's not unqualified and much shorter than writing

<repat name="tables">
    <conditional name="input_opts">
        <repeat name="linefilters">
              <conditional name="filter">
                  <param name="filter_type" value="something"/>
              </conditional>
        </repeat>
    </conditional>
</repeat>

I guess i wouldn't be too concerned i we can automatically rewrite this, but if not I would say that's a lot of tool author work for little gain (a cleaner dynamic test case XSD I assume ?). We have a helper than can turn this kind of string into a path tuple (

galaxy/lib/galaxy/tools/parameters/wrapped.py

Line 192 in 5f635a3

def nested_key_to_path(key: str) -> Sequence[Union[str, int]]:

) in case that helps.

jmchilton · 2024-08-11T13:41:27Z

@mvdbeek That test looks fine to my eye - I don't know why it didn't validate. Probably a bug... I bet I didn't implement sections 😅😭. Let me check it out and get back to you. I did intend for those kinds of parameters to validate though.

As for automatically changing the tools - I would be okay I think if we were certain we weren't going to lose comments or touch the rest of the tool. Do we have any examples of doing that well? I certainly never meant for planemo normalize to be used by tool authors as a best practice and people were doing that for a while and it left me anxious about messing with people's XML but this is clearly a much more specific application.

jmchilton · 2024-08-11T14:05:13Z

Now revised with section support 😅. I added a test case that exhibits this failure. The updated validation results for that query_tabular tool look like this:

galaxy-tool-test-case-validate ~/workspace/tools-iuc/tools/query_tabular/
Found 12 test cases to validate for tool query_tabular / 3.3.0 (@ /Users/jxc755/workspace/tools-iuc/tools/query_tabular/query_tabular.xml)
Test Case 1: validated
Test Case 2: validated
Test Case 3: validated
Test Case 4: validated
Test Case 5: validated
Test Case 6: validated
Test Case 7: validated
Test Case 8: validated
Test Case 9: validation failed - 18 validation errors for DynamicModelForTool
query_result.yes.header_prefix.literal['']
  Input should be '' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['62']
  Input should be '62' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['64']
  Input should be '64' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['43']
  Input should be '43' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['60']
  Input should be '60' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['42']
  Input should be '42' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['45']
  Input should be '45' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['61']
  Input should be '61' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['124']
  Input should be '124' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['63']
  Input should be '63' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['36']
  Input should be '36' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['46']
  Input should be '46' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['58']
  Input should be '58' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['38']
  Input should be '38' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['37']
  Input should be '37' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['94']
  Input should be '94' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['35']
  Input should be '35' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['33']
  Input should be '33' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
Test Case 10: validated
Test Case 11: validated
Test Case 12: validated
Found 4 test cases to validate for tool filter_tabular / 3.3.0 (@ /Users/jxc755/workspace/tools-iuc/tools/query_tabular/filter_tabular.xml)
Test Case 1: validated
Test Case 2: validated
Test Case 3: validated
Test Case 4: validated
Found 1 test cases to validate for tool sqlite_to_tabular / 3.2.0 (@ /Users/jxc755/workspace/tools-iuc/tools/query_tabular/sqlite_to_tabular.xml)
Test Case 1: validated

The remaining failure is basically point 3 above - selection options should be selected by value and not display name.

bgruening · 2024-08-11T15:47:20Z

I'm not sure which strawman I'm supposed to defend here 😅

I appreciate stricter typing, it will help IUC. The IUC and especially @bernt-matthias has done a lot to improve linting and the overall quality of tools. Which also makes our tools more sustainable.

In the language-server plugin, we autogenerate test-sections and if we can improve this and make it more stable that would be great.

The only thing I'm worried about is the readability aspect. In the end, a human needs to review a tool and here readability counts. So my only request would be to take readability seriously and find a good balance. E.g. if we agree that more people should use IDEs for Galaxy tool dev, a more verbose syntax would be ok if its improve readability etc ...

If there are large syntactic changes I think we should run them by https://github.com/galaxy-iuc/standards

bgruening · 2024-08-11T15:48:51Z

As for automatically changing the tools - I would be okay I think if we were certain we weren't going to lose comments or touch the rest of the tool. Do we have any examples of doing that well?

I'm quite happy with the "reformat" option of the language server, maybe we can try it at this level. It seems to keep comments etc ..

mvdbeek · 2024-08-11T17:36:57Z

The situation to avoid is to make it look like maintaining or developing a galaxy tool is hard. At one point I bet we'll need to enforce a profile version that requires rewriting the test section. We should avoid putting this work onto the occasional contributor who just needs a feature flag added or a bug fixed, this needs to be handled by the tool maintainers. If we had an easy way to fix up trivial-ish linting things like a different test syntax (internally I would hope test cases in both syntaxes are parsed into the same pydantic models) it would be easy for tool maintainers to stay ahead of the curve.

Do we have any examples of doing that well?

Not in Galaxy, but CWL maintains comments. Could our parameter models hold comments ?

jmchilton · 2024-08-11T18:42:51Z

import lxml.etree as ET


XML_OLD = """<tool>
    <tests>
           <test name="cow" value="6"><!-- a comment -->
 </test>
    </tests>
</tool>"""


XML_NEW = """<tool>
    <tests>
           <test name="qualified|cow" value="6"><!-- a comment -->
 </test>
    </tests>
</tool>"""


def test_rewrite():
    root = ET.fromstring(XML_OLD)
    element = root.find(".//test[@name='cow']")
    element.attrib["name"] = "qualified|cow"
    xml_new = ET.tostring(root).decode("utf-8")
    assert xml_new == XML_NEW

😮 - I didn't expect LXML to preserve so much of the whitespace and comments. It would only fluck with the tools before and after the root tool element (https://bugs.launchpad.net/lxml/+bug/526799). I guess we could use explicit string handling for those. It definitely seems viable to auto-upgrade tools given that. 🤔 I am not convinced this is a good use of my time when we could be developing a YAML syntax that validates directly with Pydantic and that could use with the API much more directly (or any other number of other tool syntax upgrades that would pay dividends for new tools - user installable tools, a type-safe subset of Cheetah, and more fine grained syntax for input parameter specification that would catch more problems quicker, etc..). Especially in light of (1) the work toward disabling old linters (#17081) and (2) no existing culture of requiring or pushing for tool profile upgrades when contributors change tools. I don't see why any tool author enhancing old tools would "have to" bump into these issues. This project has introduced dozens of tools to pretty explicitly make sure we're not breaking anything old.

If we do opt to write a tool upgrader - are there other features we think should be auto-upgraded?

mvdbeek · 2024-08-12T16:35:24Z

are there other features we think should be auto-upgraded?

stdio cruft that is redundant with a new profile version, unnecessary name parameters if --argument is present, future items like rewriting simple python filter expressions to something we can reason about statically, transforming RST help to markdown, and probably a bunch more things that the linter complains about.

I agree though this is isn't something for right now, and it doesn't have to be you. It would make transitioning to a newer syntax a much lighter decision though.

…nputs.

Work scoped out in 18536.

jmchilton · 2024-08-23T14:56:55Z

re: Upgrading tools automatically.

I've integrated this with the tool upgrade advisor (jmchilton@5820ed1) implemented separately as #18728. Once that commit becomes green I will include it in this PR along with #18728 if this PR is still open. I've outlined why I think upgrading automatically is more challenging than I thought but also how all the semantic data included in #18728
would be helpful for someone taking on that task in the PR description for #18728. The help text for the validation upgrade could be improved but I think taking everything together we're providing nice tooling to assist upgrading tools to 24.2.

mvdbeek · 2024-08-23T15:03:49Z

lib/galaxy/tool_util/parameters/case.py

+# In an effort to squeeze all the ambiguity out of test cases - at some point Marius and John
+# agree tools should be using value_json for typed inputs to parameters but John has come around on
+# this now that we're validating the parameters as a whole on load. The models are ensuring only
+# unambigious test cases are being loaded.


💯 ... I think what I care about is that where exact types are possible (yaml, json) we should use them. Being able to accurately coerce types is such an important part of the whole tool state work IMO.

mvdbeek

Awesome, this looks really nice!

jmchilton added kind/enhancement area/tool-framework labels Aug 10, 2024

jmchilton force-pushed the tool_test_validation branch from 9622d8a to 31002fa Compare August 10, 2024 23:18

jmchilton force-pushed the tool_test_validation branch from e6ff57c to 477a1a1 Compare August 11, 2024 14:04

jmchilton force-pushed the tool_test_validation branch 6 times, most recently from ab71ec1 to 82b297e Compare August 11, 2024 15:18

jmchilton force-pushed the tool_test_validation branch from 82b297e to d529820 Compare August 11, 2024 17:36

jmchilton force-pushed the tool_test_validation branch 3 times, most recently from 8c23038 to cd32382 Compare August 11, 2024 21:27

jmchilton marked this pull request as ready for review August 11, 2024 23:20

github-actions bot added this to the 24.2 milestone Aug 11, 2024

jmchilton force-pushed the tool_test_validation branch from cd32382 to 8124755 Compare August 20, 2024 19:02

jmchilton mentioned this pull request Aug 22, 2024

Implement tool upgrade assistant. #18728

Merged

2 tasks

jmchilton added 4 commits August 23, 2024 09:46

Not a problem now that we're not using basic.py abstractions.

9ecd893

Handle collection defaults in parameter models...

ed09ca2

Allow Paths in tool source factory method.

dba4355

Disable unqualified parameters in test cases for new tools.

e475003

jmchilton added 5 commits August 23, 2024 09:46

Disable more test case ambiguities for newer profile tools.

7f4ad54

Prevent model parameters from issuing warnings on certain valid XML i…

cc420a6

…nputs.

First pass at workflow step models - linked and unlinked.

da220ef

Work scoped out in 18536.

Utility used by test_case/workflow model parameter functions.

192d689

Tool test case parameter models.

0a35eea

jmchilton force-pushed the tool_test_validation branch from 8124755 to 0a35eea Compare August 23, 2024 13:47

mvdbeek reviewed Aug 23, 2024

View reviewed changes

mvdbeek approved these changes Aug 23, 2024

View reviewed changes

jmchilton merged commit e9c6a07 into galaxyproject:dev Aug 23, 2024
50 of 54 checks passed

jmchilton mentioned this pull request Aug 27, 2024

Improvements to parameter models for test case inputs #18743

Merged

3 tasks

jmchilton mentioned this pull request Oct 28, 2024

Integrate Tool Parameter Modeling into Linting (for Planemo) #19073

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic Models for Tool Test Validation #18679

Dynamic Models for Tool Test Validation #18679

jmchilton commented Aug 10, 2024 •

edited

Loading

mvdbeek commented Aug 11, 2024

jmchilton commented Aug 11, 2024

jmchilton commented Aug 11, 2024

bgruening commented Aug 11, 2024

bgruening commented Aug 11, 2024

mvdbeek commented Aug 11, 2024 •

edited

Loading

jmchilton commented Aug 11, 2024

mvdbeek commented Aug 12, 2024

jmchilton commented Aug 23, 2024

mvdbeek Aug 23, 2024

mvdbeek left a comment

Dynamic Models for Tool Test Validation #18679

Dynamic Models for Tool Test Validation #18679

Conversation

jmchilton commented Aug 10, 2024 • edited Loading

Command-line Validation

Limitations and Future Work

A note about included workflow models.

How to test the changes?

License

mvdbeek commented Aug 11, 2024

jmchilton commented Aug 11, 2024

jmchilton commented Aug 11, 2024

bgruening commented Aug 11, 2024

bgruening commented Aug 11, 2024

mvdbeek commented Aug 11, 2024 • edited Loading

jmchilton commented Aug 11, 2024

mvdbeek commented Aug 12, 2024

jmchilton commented Aug 23, 2024

mvdbeek Aug 23, 2024

Choose a reason for hiding this comment

mvdbeek left a comment

Choose a reason for hiding this comment

jmchilton commented Aug 10, 2024 •

edited

Loading

mvdbeek commented Aug 11, 2024 •

edited

Loading