-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenEO processes validation suite #204
Comments
@m-mohr @christophfriedrich Any thoughts on this? |
@sinergise-anze I've not forgotten your issue(s) and am grateful for having them. After my vacation I have a bit of a backlog of issues to work through. I'll get to these later for sure. |
@m-mohr Thank you for your response, I understand completely... Later then. Enjoy your vacation! |
Thanks for writing this down! First of all, I agree that it is useful to have some kind of process validation suite. There were several attempts already:
Your solution looks useful to me, I don't know what other back-ends do, maybe they have other solutions we should also invetsigate. Some remarks regarding your solution:
So while I think we should work towards a test suite. It doesn't necessarily need to be in JSON and/or in the form of full process graphs. Ideas are very welcome. I'll dig deeper into the topic and also try to come up with suggestions and maybe a full proposal. Any help is appreciated. |
Thanks, inspiring ideas here! At VITO we already have a bunch of unit/integration tests to check our process graph processing pipelines:
But it would indeed be great if this could be standardized better and reused more. The idea of an Like @m-mohr , I'm not sure that JSON is the ideal authoring level to define a lot of tests with potentially large data cube arrays. In the VITO tests I mentioned before we define the tests in Python. Sometimes using the python client library to build a process graph, sometimes without by directly constructing a JSON structure. Sometimes we use hard coded simple small arrays as input, sometimes we load data straight from geotiffs. But in any case, by working in Python we can more easily share/reuse/parameterize input data and output expectations without too much boilerplate and duplication. |
This is really an interesting discussion and I would also be in favor of a shared solution for this. At EODC we recently started to write tests for the processes we support, but we do it one level "deeper", in the package where we map an openEO PG to the syntax needed internally. There, we actually use a test file as input and manually define a minimal PG including the process we want to test, so not a very general solution. Also, most of our processes our imported from https://github.com/Open-EO/openeo-processes-python/ which have their own tests. I think the main issue for a suite of tests (a list of PGs) that every backend can use to test processes, is to have a common input, since it's hard to ensure every backend has the exact same test collection. I like @sinergise-anze's approach to have a |
Good thoughts!
Yes - we used
Maybe also, yes - though for small cubes (such that are needed for "unit tests" for each of the processes) this is not such a problem.
We don't actually need it per-se, it is just mandatory part of any process graph if we want to execute it through normal jobs pipeline. This could probably be changed, we just didn't find it important. We just ignore it.
Maybe @soxofaan: I agree that the tests can become overwhelming if the cubes are not small, but fortunately:
@lforesta: |
Conceptually it is orthogonal indeed, but I think it is best to keep things together practically (e.g. in same repository). Generating JSON from code will be more maintainable in the long term than "manually" created JSON files. Also note that it is trivial to put "manual" JSON in Python code, so it would feasible to enforce that all cases should be in code. Test parameterization in code has proven to be very useful for us. For example: say you want to test aggregate_spatial with different kinds of geometries (Polygon, MultiPolygon, Points, LineStrings, ...) and with different reducers (mean, median, max, ...). In Python+pytest we can do that with a single parameterized test function that will test all combinations of the matrix. With "manual" JSON you have to maintain tens of almost identical files. FYI: I'm using Python and pytest as an example. I'm not implying that Python+pytest should be used for the final solution (although it would be my preference). |
So I guess we have different approaches we can discuss:
For me 2B sounds most interesting and thus I'm elaborating it a bit more.
Opinions? |
@m-mohr Very good ideas! I think it is becoming clear that there would need to be way to declare any value that can be an input or output to any process (similar to our The issue with options But, I very much like the dedicated endpoint idea, and especially your last point - it means that Maybe a combination of both ideas would work?
Note that these are not strong opinions, more preferences. Any solution is OK, the main end goal (for us) is clarifying the process definitions beyond just documentation. |
@sinergise-anze That could also work if we can work around the floating point number issues. Comparing on the client side was mainly chosen so that we have a bit broader access to comparators, which include things like comparing floating point numbers based on an allowed difference specified by a delta (like the delta in the eq/neq openEO processes). If we can mitigate that somehow (e.g. specify a required precision for each result), server-side is also an option. What I also liked about client-side is that user could use that to actually debug a bit better. If they see the outcome of some operation they want to "test", it's often better understandable compared to just get error or success. Also, it makes the tests more transparent if anybody can try it. My biggest question yet is the file format for trasmitting input and output. I just found that there's JSON5 at https://json5.org , but it seems that it has only a stable JS library so far and a very early Python library at https://pypi.org/project/json5/ . Other languages might be stuck. XML could be an option due to the widespread support. Then there's also things like YAML (expressing multi-dimensional arrays feels quite error-prone), Protobuf (not sure yet) and messagepack (not so friendly for humans). I'm very happy if somebody has a good idea here or has good arguments for a format... |
Why is it necessary to have a separate
indeed, not to be underestimated
I think we should at least support standard JSON for simple test cases (where there is no need to worry about decimals, null-vs-nan, ...) because it's easy and familiar to write. In that sense JSON5 looks interesting because it's a superset of standard JSON. But it's indeed not ideal that it's not widely supported yet. I a had a quick scan through https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats for some more inspiration, but could not find something that stands out as the ideal solution. |
@soxofaan The reasons for the separate endpoint are the following:
Unfortunately, you run relatively easily into these use cases where JSON is not enough.
Great resoource, thanks! What about ION? http://amzn.github.io/ion-docs/ That looks very similar to JSON5, but has broader support in languages! It has its own media type (application/ion+json) so we could accept both JSON and ION through content negotiation. |
I'm not sure what you mean here. Processes and formats are defined orthogonally to endpoints, right? Moreover, why would you want to shield users from testing? Isn't it added value for normal users to have tools that help with verifying reproducability?
Shouldn't billing be orthogonal to endpoints too? You don't want a backdoor to run real queries for free on
That's true, it's easier to bootstrap resource usage estimations based on endpoint. But estimating it from the input data source shouldn't be a lot harder (is it a real load_collection call or a small "create_cube" input array?)
I'm not sure that it should be separated that strongly (also see my comment on 1). Also: you can also separate processes through namespaces.
I somehow missed that one, never heard of it. Looks interesting indeed. |
In a way In any case, I very much like the idea of having this functionality; this would be great for users too. |
Giving it some more thought, I have changed my mind a bit - I am more and more convinced that the original suggestion (with some improvements) is better because of its simplicity. I believe it is a) easier to implement on backend side, b) more versatile, c) easier to use and d) results in more understandable tests. I will try to write more about why I think this is so, but the TL;DR version is: if I want to test my process graph, I can simply add a few processes and copy it into editor. It doesn't get any simpler than this... :) a) easier to implement on backend side (and no client side at all) Suggested solution means creating two relatively simple processes (one to create synthetic data, one to compare data for equality). This is all that is needed. The first process somewhat matches the parsing of input data (ION?), but I would argue that using a JSON solution would be easier to use. Instead of The main simplification comes from the fact that the backend doesn't need to output serialized data or implement another endpoint. And there is no client-side needed at all. I would also argue that from the assert family, only If parametrized tests are needed, it should be trivial to change them so that they execute process graphs against a backend. b) more versatile The beauty of using process graphs is that if a user has some problem with a process graph (e.g. they are getting unexpected results), they can substitute Bonus: such examples can then be sent to backends as complete bug reports. c) easier to use The whole process can be done within the OpenEO WebEditor (or using a curl), no special tools need to be installed. Of course special debugging tools can be used on client side if so inclined; they are just not mandatory. User can still download the TIFFs and simply inspect results (even partial ones) for equality. d) results in more understandable tests Not everyone is familiar with Python / ION / any other chosen technology. Using the existing framework would mean that anyone who can read process graphs can also understand the tests. In other words, I would suggest doing this:
EDIT: reading this again - I hope the tone is right. I don't want to impose on the vision, as I was not involved in this project as much as any of you. So please take this as a constructive suggestion that it is. :) |
I don't want to shield users from testing, I actually want the opposite. I meant having for example a file format "JSON" or a process create_cube that is meant for testing only may confuse users as they think it's something they can use differently for "normal" process chains. I'm not sure that "fear" is actually reasonable or it's not a problem at all.
Of course not, therefore it would be really limited to the use case. I must admit I wouldn't really want to pay for requests that are so small and just for debugging. It may need a separate billing plan for testing, which could be counter-intuitive.
Yes, in the PG, but not yet in process discovery. So that doesn't help a lot at the moment.
Yes, but there's still a difference between
I don't understand that. What fields are you thinking about? Splitting my answer in two parts, will answer @sinergise-anze in the second post |
When thinking about a process validation suite, I'd excpect that I can test any process independantly, so for example simply min() or is_nan(), which has no relation at all to data cubes. I don't want to wrap thise processes into cube processes like reduce_dimension to test them. I hope we have the same understanding here. Some unordered remarks in response to your post: In addition to create_cube, you'd also need a function to create normal arrays as proposed in #202 ( Regarding a JSON-only solution: How would you test the is_nan or is_infinite process for example if you can't pass +/-inf and nan through JSON? Of course you can check Generating a GeoTiff for tests to check data on the client side sounds bothersome to me. At least I have no tooling except QGIS directly at hand to check the content of the file. As a user, I think I would prefer to just get a "human-readable" representation of the result to see what is going on at the back-end. How would back-end handle the case if my test does not contain a save_result and load_collection? At the moment it seems no back-end is prepared for this, so is it really so much easier to implement compared to a new endpoint? Not saying one or the other is better, but I think in general it could be some work for others. If we think about a solution now, I'd not like to require workarounds like adding an unnecessary processes like save_result to the tests. Also, I can't pass the result of a non-cube operation to save_result, so the GeoTiff solution fails at the moment for non-cube operations. Regarding parametrized tests I don't understand the solution yet. Is it using process parameters? It is indeed correct that client support is currently better for your approach but...
... I'd very likely integrate any new solution in the Web Editor, too. ;-)
Isn't that possible with all solutions discussed so far?
We are not bound to Python with a new endpoint, it can be any language or the Web Editor. (Of course our core set of tests would likely be in a single language, but I'm pretty sure also the generation of the tests used in your proposal would be generated from a language of choice, right?) Additionally, ION (and other implementation details) would/should likely be hidden anyway. Actually, reading process graphs is nothing we'd expect users to do. We actually try to hide them. We just expect them to use any of our tools (libraries, Web Editor, ...).
For me the tone is right 👍, I hope mine is fine as well. It's all meant as a constructive discussion. |
FYI: the VITO backend already supports process graphs without load_collection or save_result. For example: add 2 numbers (calculator as a service FTW 😄 )
example using array_element and an input array through the
(not clear from these examples, but result is returned as JSON) |
@m-mohr Thank you for your in-depth response! The idea I had was to use While writing the tests, I noticed that my idea was actually not good enough. Having a process output data is cumbersome and it is near impossible to use for e.g. Additional added value is that now the tests are even shorter (and thus easier to read) than they were before. Also, this means that data can be fed directly into any process, regardless of the datatype, so this is quite a powerful addition. But the cons is that another "object notation object" needs to be defined. And of course, defining all possible datatypes is a whole mini project by itself... Note that I understand this is just one of possible solutions and that it is a trade-off. |
Thank you, @sinergise-anze. That is quite a bit to look through, I'll get back to that later, maybe even with an alternative solution based on your POC. |
@soxofaan Great! I'm sure I heard others can't handle that yet (e.g. GEE and at least one more).
I think the issue is, we never specified what the outcome of a sync call without save_result is. So you are using JSON, which makes sense, but is likely not implemented broadly yet. Also, what would the format of the JSON be like? For non-datacubes it seems pretty straightforward though. |
We created Pydantic Classes from the OpenEO API JSON process graph specification and tested them against the standard processes (from https://raw.githubusercontent.com/Open-EO/openeo-processes/1.x.0/) So,
Running the tests against other lists of standard processes reveals other inconsistencies. Not sure how to proceed here in a way that benefits all. PS. the reason we used Pydantic is that FastAPI gives us automatic checking of the API parameters) |
Hey @danielFlemstrom, I've moved your issue over to #318 so that we can clearly separate the issues of having a process validation suite (this issue) vs. specific issues that we find (most of your issue). Thank you for your contribution. |
Note: I have added this issue here instead of creating an
assert_equals
process proposal because I think a more general agreement should be reached before adding process definitions. I would be happy to add PRs for the processes later if this is what we agree on.Problem statement:
While implementing processes in our backend and especially when converting them from v0.4 to v1.0, we have discovered that it is in many cases difficult to understand what needs to be developed because of:
Because of this, we are not 100% sure that the processes we have written (even the ones we wrote from scratch, lately, and covered with the unit tests) really conform to the standard. Or even that other backends' implementations match ours.
We see this as a problem because:
Proposed solution:
Our goal in this issue is primarily to open a discussion on this, however, we do also have a proposal of a mechanism that would help solve this problem:
assert_equals
process definition in the core standard andassert_*
processes to validate backend's implementation of a process (such test process graphs could be (logically) included into the backend-validator efforts, since they try to address the same issue, at a different level of granularity).Short description of
assert_equals
: Process takes inputa
andb
. If they are the same (type, attributes, values,...), it does nothing. If they differ, it throws an exception, describing the difference.We already use a similar approach for our integration tests. For example, this is a process graph which tests
linear_scale_range
process:Of course, this is not a comprehensive example - many more test process graphs should be added. In our case we didn't need it because we cover processes with separate unit tests in Python, but those can't be used with other backends and as such are not helpful for ensuring all the backends' conformance to the standard.
Benefits for the users:
We are aware of the limited utility this gives to the end user, though there is still some benefit in adding this:
load_collection
and addassert_equals
for this though)Note that this is just a suggestion. There might be other, better ways of solving the same problem - we just think that this is a problem worth solving. Also,
assert_equals
is already implemented in our backend as we found it useful even for our purposes only, though it could probably be improved (currently it only works with raster datacubes).We would appreciate some thoughts on this.
The text was updated successfully, but these errors were encountered: