Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic Models for Tool Test Validation #18679

Merged
merged 9 commits into from
Aug 23, 2024

Conversation

jmchilton
Copy link
Member

@jmchilton jmchilton commented Aug 10, 2024

  • Profile 24.2 or newer tools will not allow a test case to load at all if that test case does not validate.
  • Profile 24.2 or newer tools disable unqualified access to conditional and repeat parameters - these lead to ambiguities. Tool developers should be using <conditional> and <repeat> tags to avoid ambiguity or using the fully qualified state path (see @mvdbeek's first comment below).
  • Profile 24.2 or newer tools are more strongly typed - select parameters must be specified in test parameter specification by option value and not option display, boolean parameters must be specified as booleans and not using the truevalue / falsevalue, and column parameters must be specified as integers.

I imagine everyone is vaguely on board for validating the test cases immediately. It will catch many class of tool test problems immediately and unquestionably speed up new tool development. I think what I've declared best practices and am validating vs. not validating is probably a lot more contentious. I will setup strawman arguments here and argue against them but I will also try to be open to changing things around if people think my path on the particulars is wrong.

In my head, there is going to be a pragmatic camp led by IUC folks (maybe @bgruening ?) that have been burnt by ever increasing linting stringency making it harder to upgrade older existing tools. I would argue that the increased type safety and simplicity will reduce potential problems with tool and make tools more uniform and safer to reason about. I think it is also helpful to simplify the tool chain and helpful in training developers how they should use their tools with the API ideally. The benefits are worthy of a little disruption vs. some more picky linting we've done in the past. I've also included utilities to help developers with the migration (see Command-line Validation below).

In my head, the other side is going to be the purist camp led by @mvdbeek that would like even stricter validation - further migration toward CWL purity. For instance, we can ensure we're loading integers right from the tool source using value_json instead of value. @mvdbeek and I had discussed making this the best practice at some point. Working through this project - I've come around on this though. By doing the validation and ensuring that each parameter type (or combination of parameter type and whether it is multiple or not), there is not ambiguity any more. No reason to think <parameter name="threshold" value="7"> might be a string after this PR - we now know it is an integer when validating the generated API request for the test. And all that logic is in galaxy-tool-util and available without a Galaxy runtime. There is a flag in the code to enable this requirement - at least at the warning level - but I don't think it should be switched on.

For hundreds of real world examples to evaluate these models against - the tool test validation applied to the IUC looks like this https://gist.github.com/jmchilton/e84f91cf2646057e5cd25781ba7d422b.

Command-line Validation

Ideally, this will land up in Planemo after we've worked through the right path forward on best practices and what should validate and what shouldn't. But I have still included a little helper utility to check existing tools. This can be used to generate JSON details about validation errors (and some classes of warnings for older tools) on the command-line as either plain text or as JSON. The json generation might be useful for IDE helpers for instance.

galaxy-tool-test-case-validate [path_to_tool_or_directory_of_tools] 

Run validation on the test cases and collect validation failures and type related warnings.

galaxy-tool-test-case-validate --latest [path_to_tool_or_directory_of_tools] 

Run validation on the test cases and collect validation failures and type related warnings. Use 24.2 profile to do this and ignore the tool's profile. Gives some indication on whether the tool's test cases would be a blocker to upgrading the tool to the latest profile.

galaxy-tool-test-case-validate --json [path_to_tool_or_directory_of_tools] 

Run validation but generate JSON - for CLI tooling and non-Python/non-Planemo tooling chains (IDE support).

I've run the tool on the IUC repository to produce this report https://gist.github.com/jmchilton/e84f91cf2646057e5cd25781ba7d422b.

Limitations and Future Work

All of this is kind of... bleh in terms of implementation and utility because the thing being validated isn't the developer's tool source but instead a weird json dictionary used by Galaxy to communicate the tool test to the client tooling. For this reason I've switched the name of the parameter model representation from test_case to test_case_xml. I don't think there is anything wrong with validating things at this level but ideally, we would be actually validating the source file contents also.

My next steps toward that end are to implementation a model representation - maybe called test_case_source that would be similar to this but targeting the tool source. It would be future facing and do things like replace comma separated lists with actual JSON lists. Then we would use that dynamic model to either:

  1. Generate an XSD on the fly for the XML tool test. -or-
  2. Implement an alternative tool syntax for specifying tool tests mirrors the syntax used by Planemo testings for workflows (https://planemo.readthedocs.io/en/latest/test_format.html). (my preference)
<tool>
  <inputs>...</inputs>
  <outputs>...</outputs>
  ...
  <tests><![CDATA[
- doc: this is my test case
  job:
     input:
       value: 1.bed
       class: File
     options:
       - opt1
       - opt2
       - opt4
   outputs:
     my_output:
       checksum: "sha1$a0b65939670bc2c010f4d5d6a0b3e4e4590fb92b"
]]></tests>
</tool>

This would allow line number based issues and since we can easily generate jsonschema for these dynamic models - there are many ways a tool chain could validate this part of the tool regardless of language and runtime.

A note about included workflow models.

I started on the workflow state work simultaneously but the test case work advanced more rapidly given that it is easier and doesn't require a tool shed upgrade. I have stripped out all of the validation and conversion plumbing (it is its own commit in #17393 now) but there is still a bunch of validation tests setting up a shot at #18536 in this PR and I can't really decouple them from the test case work. I would prefer we defer any conversation about those test cases until I open a PR for the workflow state validation stuff. I promise to foreground all the examples and be very open to having them picked apart at that time if we can stay focused on the tool test stuff here? Is that okay?

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

@mvdbeek
Copy link
Member

mvdbeek commented Aug 11, 2024

I love it, and I think realistically the alternative tool syntax will allow more precise validation.
Do you think it is possible to provide a utility that can rewrite the test cases to the new syntax ?

One comment on the validation report: https://gist.github.com/jmchilton/e84f91cf2646057e5cd25781ba7d422b#file-gistfile1-txt-L76:

Test Case 8: validation failed - Invalid parameter name found tables_0|input_opts|linefilters_0|filter|filter_type

that's not unqualified and much shorter than writing

<repat name="tables">
    <conditional name="input_opts">
        <repeat name="linefilters">
              <conditional name="filter">
                  <param name="filter_type" value="something"/>
              </conditional>
        </repeat>
    </conditional>
</repeat>

I guess i wouldn't be too concerned i we can automatically rewrite this, but if not I would say that's a lot of tool author work for little gain (a cleaner dynamic test case XSD I assume ?). We have a helper than can turn this kind of string into a path tuple (

def nested_key_to_path(key: str) -> Sequence[Union[str, int]]:
) in case that helps.

@jmchilton
Copy link
Member Author

@mvdbeek That test looks fine to my eye - I don't know why it didn't validate. Probably a bug... I bet I didn't implement sections 😅😭. Let me check it out and get back to you. I did intend for those kinds of parameters to validate though.

As for automatically changing the tools - I would be okay I think if we were certain we weren't going to lose comments or touch the rest of the tool. Do we have any examples of doing that well? I certainly never meant for planemo normalize to be used by tool authors as a best practice and people were doing that for a while and it left me anxious about messing with people's XML but this is clearly a much more specific application.

@jmchilton jmchilton force-pushed the tool_test_validation branch from e6ff57c to 477a1a1 Compare August 11, 2024 14:04
@jmchilton
Copy link
Member Author

Now revised with section support 😅. I added a test case that exhibits this failure. The updated validation results for that query_tabular tool look like this:

galaxy-tool-test-case-validate ~/workspace/tools-iuc/tools/query_tabular/
Found 12 test cases to validate for tool query_tabular / 3.3.0 (@ /Users/jxc755/workspace/tools-iuc/tools/query_tabular/query_tabular.xml)
Test Case 1: validated
Test Case 2: validated
Test Case 3: validated
Test Case 4: validated
Test Case 5: validated
Test Case 6: validated
Test Case 7: validated
Test Case 8: validated
Test Case 9: validation failed - 18 validation errors for DynamicModelForTool
query_result.yes.header_prefix.literal['']
  Input should be '' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['62']
  Input should be '62' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['64']
  Input should be '64' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['43']
  Input should be '43' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['60']
  Input should be '60' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['42']
  Input should be '42' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['45']
  Input should be '45' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['61']
  Input should be '61' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['124']
  Input should be '124' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['63']
  Input should be '63' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['36']
  Input should be '36' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['46']
  Input should be '46' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['58']
  Input should be '58' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['38']
  Input should be '38' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['37']
  Input should be '37' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['94']
  Input should be '94' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['35']
  Input should be '35' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
query_result.yes.header_prefix.literal['33']
  Input should be '33' [type=literal_error, input_value='#', input_type=str]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error
Test Case 10: validated
Test Case 11: validated
Test Case 12: validated
Found 4 test cases to validate for tool filter_tabular / 3.3.0 (@ /Users/jxc755/workspace/tools-iuc/tools/query_tabular/filter_tabular.xml)
Test Case 1: validated
Test Case 2: validated
Test Case 3: validated
Test Case 4: validated
Found 1 test cases to validate for tool sqlite_to_tabular / 3.2.0 (@ /Users/jxc755/workspace/tools-iuc/tools/query_tabular/sqlite_to_tabular.xml)
Test Case 1: validated

The remaining failure is basically point 3 above - selection options should be selected by value and not display name.

@jmchilton jmchilton force-pushed the tool_test_validation branch 6 times, most recently from ab71ec1 to 82b297e Compare August 11, 2024 15:18
@bgruening
Copy link
Member

I'm not sure which strawman I'm supposed to defend here 😅

I appreciate stricter typing, it will help IUC. The IUC and especially @bernt-matthias has done a lot to improve linting and the overall quality of tools. Which also makes our tools more sustainable.

In the language-server plugin, we autogenerate test-sections and if we can improve this and make it more stable that would be great.

The only thing I'm worried about is the readability aspect. In the end, a human needs to review a tool and here readability counts. So my only request would be to take readability seriously and find a good balance. E.g. if we agree that more people should use IDEs for Galaxy tool dev, a more verbose syntax would be ok if its improve readability etc ...

If there are large syntactic changes I think we should run them by https://github.com/galaxy-iuc/standards

@bgruening
Copy link
Member

As for automatically changing the tools - I would be okay I think if we were certain we weren't going to lose comments or touch the rest of the tool. Do we have any examples of doing that well?

I'm quite happy with the "reformat" option of the language server, maybe we can try it at this level. It seems to keep comments etc ..

@jmchilton jmchilton force-pushed the tool_test_validation branch from 82b297e to d529820 Compare August 11, 2024 17:36
@mvdbeek
Copy link
Member

mvdbeek commented Aug 11, 2024

The situation to avoid is to make it look like maintaining or developing a galaxy tool is hard. At one point I bet we'll need to enforce a profile version that requires rewriting the test section. We should avoid putting this work onto the occasional contributor who just needs a feature flag added or a bug fixed, this needs to be handled by the tool maintainers. If we had an easy way to fix up trivial-ish linting things like a different test syntax (internally I would hope test cases in both syntaxes are parsed into the same pydantic models) it would be easy for tool maintainers to stay ahead of the curve.

Do we have any examples of doing that well?

Not in Galaxy, but CWL maintains comments. Could our parameter models hold comments ?

@jmchilton
Copy link
Member Author

import lxml.etree as ET


XML_OLD = """<tool>
    <tests>
           <test name="cow" value="6"><!-- a comment -->
 </test>
    </tests>
</tool>"""


XML_NEW = """<tool>
    <tests>
           <test name="qualified|cow" value="6"><!-- a comment -->
 </test>
    </tests>
</tool>"""


def test_rewrite():
    root = ET.fromstring(XML_OLD)
    element = root.find(".//test[@name='cow']")
    element.attrib["name"] = "qualified|cow"
    xml_new = ET.tostring(root).decode("utf-8")
    assert xml_new == XML_NEW

😮 - I didn't expect LXML to preserve so much of the whitespace and comments. It would only fluck with the tools before and after the root tool element (https://bugs.launchpad.net/lxml/+bug/526799). I guess we could use explicit string handling for those. It definitely seems viable to auto-upgrade tools given that. 🤔 I am not convinced this is a good use of my time when we could be developing a YAML syntax that validates directly with Pydantic and that could use with the API much more directly (or any other number of other tool syntax upgrades that would pay dividends for new tools - user installable tools, a type-safe subset of Cheetah, and more fine grained syntax for input parameter specification that would catch more problems quicker, etc..). Especially in light of (1) the work toward disabling old linters (#17081) and (2) no existing culture of requiring or pushing for tool profile upgrades when contributors change tools. I don't see why any tool author enhancing old tools would "have to" bump into these issues. This project has introduced dozens of tools to pretty explicitly make sure we're not breaking anything old.

If we do opt to write a tool upgrader - are there other features we think should be auto-upgraded?

@jmchilton jmchilton force-pushed the tool_test_validation branch 3 times, most recently from 8c23038 to cd32382 Compare August 11, 2024 21:27
@jmchilton jmchilton marked this pull request as ready for review August 11, 2024 23:20
@github-actions github-actions bot added this to the 24.2 milestone Aug 11, 2024
@mvdbeek
Copy link
Member

mvdbeek commented Aug 12, 2024

are there other features we think should be auto-upgraded?

stdio cruft that is redundant with a new profile version, unnecessary name parameters if --argument is present, future items like rewriting simple python filter expressions to something we can reason about statically, transforming RST help to markdown, and probably a bunch more things that the linter complains about.

I agree though this is isn't something for right now, and it doesn't have to be you. It would make transitioning to a newer syntax a much lighter decision though.

@jmchilton jmchilton force-pushed the tool_test_validation branch from cd32382 to 8124755 Compare August 20, 2024 19:02
@jmchilton jmchilton force-pushed the tool_test_validation branch from 8124755 to 0a35eea Compare August 23, 2024 13:47
@jmchilton
Copy link
Member Author

re: Upgrading tools automatically.

I've integrated this with the tool upgrade advisor (jmchilton@5820ed1) implemented separately as #18728. Once that commit becomes green I will include it in this PR along with #18728 if this PR is still open. I've outlined why I think upgrading automatically is more challenging than I thought but also how all the semantic data included in #18728
would be helpful for someone taking on that task in the PR description for #18728. The help text for the validation upgrade could be improved but I think taking everything together we're providing nice tooling to assist upgrading tools to 24.2.

# In an effort to squeeze all the ambiguity out of test cases - at some point Marius and John
# agree tools should be using value_json for typed inputs to parameters but John has come around on
# this now that we're validating the parameters as a whole on load. The models are ensuring only
# unambigious test cases are being loaded.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 ... I think what I care about is that where exact types are possible (yaml, json) we should use them. Being able to accurately coerce types is such an important part of the whole tool state work IMO.

Copy link
Member

@mvdbeek mvdbeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, this looks really nice!

@jmchilton jmchilton merged commit e9c6a07 into galaxyproject:dev Aug 23, 2024
50 of 54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants