[Roadmap/Planning] Test Infrastructure #4252

ffreyer · 2023-07-31T16:02:34Z

ffreyer
Jul 31, 2023
Maintainer

I wanted to summarize some goals and issues with Makies testing here.

We currently rely quite heavily on cross-backend reference image tests, which need to have a relatively high tolerance to deal with differences between OpenGL, WebGL and Cairo. This leads to small visual changes being missed by testing and in pr reviews regularly. To improve I think we broadly need to do two things - reduce our reliance on reference image tests by improving coverage on (code based) unit tests and improve the reliability of reference image tests.

Code based unit tests

Currently most of Makie's unit tests are part of Makie.jl, with little to none in the backends. This is understandable since the backends are mostly responsible for rendering, however there is also some code that can and should be tested there. Makie.jl itself has fairly spotty testing. Ultimately we will need to spend some time here to figure out what is missing an what is reasonable to test. For now here is an incomplete list of possible additions:

Exhaustive testing of convert_arguments(), i.e. every possible conversion path. This may also help us find redundant or missing methods, if there are any.
Passthrough of generic plot attributes in recipes, e.g. "Does every child plot inherit visible?"
correct construction of transformation matrices (translation, rotation, scale, camera matrices)
lots of small functions like angle(p1, p2), to_ndim(), etc
getters like parent(scene), events(), etc
most of MakieCore?
correct construction of renderobjects in GLMakie (buffers, uniforms)
most of CairoMakie/utils.jl
display system
existence of docstrings for exported functions

I also want to note that test coverage can be quite deceptive. I tried just f, a, p = scatter(Point2f(0)); @test p[1][] == Point2f(0) a while ago and got 26% coverage with just that. It may make sense to avoid and/or exclude these kind of full-stack calls to get more useful results.

Reference image tests

For reference images we should start including (more) backend specific tests. These tests should be running with low tolerance and be very specific, like unit tests. I.e. they should not include an Axis, they should not rely on camera placement, they should be as independent of Makie.jl as they can be. They should test the specific things implemented by the backend and aim to check edge cases as well. There is some previous discussion on this in #298.

For cross-backend tests we likely need to do some cleanup. Some questions we can ask ourselves here are:

Can this test be handled with code? (e.g. "Test RGB heatmaps")
Is this test redundant, i.e. is it part of another test?
Should this test include a more niche scenario?

And more generally:

Should tests that do not work for all "stable" backends be included or be backend-specific tests?
Should we drop animated tests so we can lower overall tolerance? Or give them a different tolerance than other tests?

Another group of reference image tests that would be good to include are the images from documentation. These, I think, would not aim to test functionality as much, but rather make sure that our documentation looks correct.

General remarks:

As noted in Refimage test accuracy #1387 we also need to make sure our tests are visually busy enough to fail when they break. Maybe we can build a simple check for this when generating reference images - for example one that removes all user made plots from axis objects?
Sort of the same point, but we need to emphasize details to make sure our testing can catch them. This may require creating "ugly" or unnatural reference images. See 2D streamplot tests succeed even if markers are too small to see #1383

Some other failures of reference images:

jkrumbiegel · 2023-08-01T09:22:05Z

jkrumbiegel
Aug 1, 2023
Maintainer

For reference images we should start including (more) backend specific tests. These tests should be running with low tolerance and be very specific, like unit tests. I.e. they should not include an Axis, they should not rely on camera placement, they should be as independent of Makie.jl as they can be.

Agree 100%. Already did a couple tests like that here and there but not nearly covering all the functionality we have

0 replies

jkrumbiegel · 2023-08-01T09:24:29Z

jkrumbiegel
Aug 1, 2023
Maintainer

Should we drop animated tests so we can lower overall tolerance?

We shouldn't really need to if we don't store these as videos (because of the lossy compression), although that will increase refimage size. But each animation can usually be lowered to 2-4 frames anyway for what we care about so it's more like Stepper in that sense

0 replies

jkrumbiegel · 2023-08-01T13:29:50Z

jkrumbiegel
Aug 1, 2023
Maintainer

One thing I think not mentioned here is update tests. We have a couple of those here and there but we should systematically check that we can update plot observables with different lengths without them breaking. Even if we have to use the val strategy at least we should ensure that that workaround is always available.

0 replies

ffreyer · 2023-08-01T13:37:43Z

ffreyer
Aug 1, 2023
Maintainer Author

I think we'll discover lots of stuff to add in unit tests if we actually sit down and check coverage without high level calls. But I agree that the synchronous updates should be tested.

0 replies

ffreyer · 2023-08-03T13:05:14Z

ffreyer
Aug 3, 2023
Maintainer Author

Sort of the same point, but we need to emphasize details to make sure our testing can catch them. This may require creating "ugly" or unnatural reference images. See 2D streamplot tests succeed even if markers are too small to see #1383

Another example from #3102:

0 replies

jkrumbiegel · 2023-08-03T14:32:29Z

jkrumbiegel
Aug 3, 2023
Maintainer

Maybe we should check for literature on reference image testing. It really seems like the default method we're using is not suitable.

0 replies

ffreyer · 2023-08-12T15:04:33Z

ffreyer
Aug 12, 2023
Maintainer Author

Another minor suggestion for reference images: I think it would be good to sort these into folders based on the file they come from. I sometimes look at the generated images directly and it's always a pain because it's just a big pile of images.

0 replies

ffreyer · 2023-08-28T18:25:13Z

ffreyer
Aug 28, 2023
Maintainer Author

Maybe we should also add timings to our tests so we can track which tests eat up a lot of time.

0 replies

jkrumbiegel · 2023-08-29T04:14:57Z

jkrumbiegel
Aug 29, 2023
Maintainer

Oh yeah, been meaning to do something like that. TimerOutputs.jl might make this relatively painless? I also wanted to have it for the docs to understand which pages or examples take the most time

0 replies

ffreyer · 2023-08-29T10:25:33Z

ffreyer
Aug 29, 2023
Maintainer Author

@testset has a showtiming option. For tests that is probably enough. Maybe with another manual timing of startup code?
https://docs.julialang.org/en/v1/stdlib/Test/#Test.@testset

For docs, maybe TimerOutputs could be useful if @time gets messy. You'd have to set up some way to generate unique, useful names for each block of code though.

0 replies

jkrumbiegel · 2023-08-30T04:31:44Z

jkrumbiegel
Aug 30, 2023
Maintainer

You'd have to set up some way to generate unique, useful names for each block of code though.

For the docs it might be enough to use a running index for each codeblock. Name is tough, internally I use hashes of the code.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Roadmap/Planning] Test Infrastructure #4252

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 11 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

[Roadmap/Planning] Test Infrastructure #4252

ffreyer Jul 31, 2023 Maintainer

Code based unit tests

Reference image tests

Replies: 11 comments

jkrumbiegel Aug 1, 2023 Maintainer

jkrumbiegel Aug 1, 2023 Maintainer

jkrumbiegel Aug 1, 2023 Maintainer

ffreyer Aug 1, 2023 Maintainer Author

ffreyer Aug 3, 2023 Maintainer Author

jkrumbiegel Aug 3, 2023 Maintainer

ffreyer Aug 12, 2023 Maintainer Author

ffreyer Aug 28, 2023 Maintainer Author

jkrumbiegel Aug 29, 2023 Maintainer

ffreyer Aug 29, 2023 Maintainer Author

jkrumbiegel Aug 30, 2023 Maintainer

ffreyer
Jul 31, 2023
Maintainer

jkrumbiegel
Aug 1, 2023
Maintainer

jkrumbiegel
Aug 1, 2023
Maintainer

jkrumbiegel
Aug 1, 2023
Maintainer

ffreyer
Aug 1, 2023
Maintainer Author

ffreyer
Aug 3, 2023
Maintainer Author

jkrumbiegel
Aug 3, 2023
Maintainer

ffreyer
Aug 12, 2023
Maintainer Author

ffreyer
Aug 28, 2023
Maintainer Author

jkrumbiegel
Aug 29, 2023
Maintainer

ffreyer
Aug 29, 2023
Maintainer Author

jkrumbiegel
Aug 30, 2023
Maintainer