Rationalize tests for CI vs. releases #1117

ViralBShah · 2024-11-27T14:20:14Z

Is it possible to rationalize the way we test in CI to use much lesser time than right now?

Maybe have a set of CI tests and a set of complete tests. Perhaps the complete tests can be run during releases of LinearAlgebra.jl into julia master.

I suppose the harder part of all of this is the workflow.

@DilumAluthge Any ideas?

andreasnoack · 2024-11-27T14:29:57Z

Since most time is spent on compilation, it significantly increases to total CPU time when the tests are run as many separate jobs. However, the user time might take a significant hit if run sequentially. Might also be interesting to see the effect to running tests with a low optimization level.

KristofferC · 2024-11-27T14:34:03Z

Here is sorted test time.

Test (Worker)	Time (s)	GC (s)	GC %	Alloc (MB)	RSS (MB)
ldlt (16)	1.95	0.02	1.2	64.84	688.12
factorization (14)	10.86	0.13	1.2	393.02	666.03
abstractq (12)	16.26	0.18	1.1	617.43	711.82
pinv (16)	21.28	0.43	2.0	1107.97	688.12
givens (16)	21.83	0.26	1.2	839.98	623.48
symmetriceigen (16)	42.04	0.78	1.9	1384.64	694.36
schur (14)	53.11	0.56	1.1	2009.62	666.03
adjtrans (16)	73.02	1.19	1.6	3105.21	623.48
lapack (14)	106.15	1.19	1.1	3470.32	650.14
lq (15)	108.49	1.7	1.6	4975.61	618.0
blas (12)	119.79	1.85	1.5	5485.69	525.55
generic (10)	127.14	1.72	1.4	4609.6	715.38
svd (17)	132.8	2.35	1.8	6257.72	543.32
uniformscaling (14)	153.87	2.15	1.4	5505.12	634.76
structuredbroadcast (15)	161.54	2.73	1.7	8148.26	575.81
eigen (12)	193.75	2.65	1.4	6849.69	711.82
hessenberg (16)	199.8	2.92	1.5	8216.44	541.0
qr (10)	231.48	3.63	1.6	9941.32	585.98
tridiag (17)	238.97	3.52	1.5	10335.22	859.79
bunchkaufman (15)	327.26	10.93	3.3	34795.95	1162.19
cholesky (11)	480.62	5.93	1.2	16072.22	655.04
matmul (5)	492.49	6.04	1.2	18103.55	679.82
symmetric (7)	535.37	8.6	1.6	23216.27	802.7
special (9)	535.51	9.67	1.8	31511.59	920.5
lu (13)	536.12	5.34	1.0	15547.96	651.05
dense (6)	579.19	8.61	1.5	22657.19	812.67
diagonal (8)	646.46	9.09	1.4	24207.6	833.98
addmul (3)	801.18	6.25	0.8	16774.03	589.55
bidiag (4)	1222.44	19.73	1.6	59146.33	1316.59
triangular (2)	2910.48	68.6	2.4	206813.92	2560.22

The triangular and bidiag have like triply nested loops over a large set of types which just explodes compilation time in a combinatorial manner.

dkarrasch · 2024-11-27T14:36:55Z

The triangular and bidiag have like triply nested loops over a large set of types which just explodes compilation time in a combinatorial manner.

They also have unitful tests. I had already thought about hoisting them out in an extra test file, but haven't got to it.

andreasnoack · 2024-11-27T15:25:39Z

An extra data point #1118 (comment)

ViralBShah · 2024-11-27T16:12:28Z

Do we really need to test Triangular this much? Perhaps we can just test fewer types as well.

andreasnoack · 2024-11-27T16:32:23Z

Might be possible to reduce the number of combinations but looping over types did reveal several issues when it was introduced.

ViralBShah · 2024-11-27T17:16:21Z

Might be possible to reduce the number of combinations but looping over types did reveal several issues when it was introduced.

Yes, that's what I was thinking. It was greatly helpful early on, but presumably for catching regressions, we may only need fewer combinations.

jishnub · 2024-11-27T17:17:55Z

It should be possible to ensure that code coverage isn't reduced by trimming the combinations.

andreasnoack · 2024-11-27T17:33:28Z

Code coverage is just an incomplete measure when it comes to some of the possible issues that can show up when mixing types. I'm not saying that we can't reduce the tests. I'm just saying that it is possible to make them less useful while maintaining code coverage.

KristofferC · 2024-11-27T18:38:01Z

There should still be some thought behind it instead of "spray and pray" so it is established what is unique about every combination. There are no comments now which makes it hard to know what the thought was.

andreasnoack · 2024-11-27T19:46:12Z

Just browsed test/triangular.jl again. It's been a while. I actually don't think it is unreasonable. You can argue if it is reasonable to have all four triangular types fully parametric and allow for operations on mixed element types with "correct" promotion handling. But that was the choice we made, and given that choice, I think the testing here is roughly appropriate. The load balancing is off, though, so we can reduce the test time a lot by splitting the triangular tests in three or four.

KristofferC · 2024-11-27T20:07:22Z

One should then test things "orthogonally":

Test that types promote to each other as they should
Test that when mixed types are used the promotion happens

There is absolutely no way you should be required to test a semirandom 5x5x5x5 type combination. You are still missing 99% of custom types from the ecosystem.

Again, I completely reject the notion that you have to spend one hour compiling to test the tridiagonal support in Julia's linear algebra. 😆

ViralBShah · 2024-11-27T20:13:48Z

I thought back in the day, doing this in deeply nested loops etc. helped discover issues in the compiler.

andreasnoack · 2024-11-27T22:22:23Z

I thought back in the day, doing this in deeply nested loops etc. helped discover issues in the compiler.

I actually don't think that many compiler issues surfaced as part of this testing but there were many issues related to promotion. There are just a lot of type combinations related to the triangular types. It's cool that Julia allow you to maintain structures such

UnitLowerTriangular*UnitLowerTriangular = UnitLowerTriangular
UnitLowerTriangular*LowerTriangular = LowerTriangular
LowerTriangular*UpperTriangular = (Some)Matrix

and that backsolve with UpperUnitTriangular doesn't promote much because it doesn't have any division, but it requires a lot of methods to be compiled.

There is absolutely no way you should be required to test a semirandom 5x5x5x5 type combination.

The full cartesian product here is probably not adding much value, though. It's just an easy. Let's see how much of a difference #1123 makes as a first step.

ViralBShah changed the title ~~Rationalize tests~~ Rationalize tests for CI vs. releases Nov 27, 2024

andreasnoack mentioned this issue Nov 27, 2024

Reduce number of test combinations in test/triangular.jl #1123

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationalize tests for CI vs. releases #1117

Rationalize tests for CI vs. releases #1117

ViralBShah commented Nov 27, 2024 •

edited

Loading

andreasnoack commented Nov 27, 2024

KristofferC commented Nov 27, 2024

dkarrasch commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

ViralBShah commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

ViralBShah commented Nov 27, 2024

jishnub commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

KristofferC commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

KristofferC commented Nov 27, 2024 •

edited

Loading

ViralBShah commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

Rationalize tests for CI vs. releases #1117

Rationalize tests for CI vs. releases #1117

Comments

ViralBShah commented Nov 27, 2024 • edited Loading

andreasnoack commented Nov 27, 2024

KristofferC commented Nov 27, 2024

dkarrasch commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

ViralBShah commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

ViralBShah commented Nov 27, 2024

jishnub commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

KristofferC commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

KristofferC commented Nov 27, 2024 • edited Loading

ViralBShah commented Nov 27, 2024

andreasnoack commented Nov 27, 2024

ViralBShah commented Nov 27, 2024 •

edited

Loading

KristofferC commented Nov 27, 2024 •

edited

Loading