-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve testing with stats and errors per test section #498
Conversation
e7214a2
to
7e2e13e
Compare
Thanks for the PR! FYI: I probably won't get around to reviewing this until after our 0.26 release is out in early June. |
@bioball no worries! I actually submitted the PR too early :S I'm re-doing it all because (1) I didn't realize I had a bunch of tests failing and (2) the failures and other stuff made me re-think it all. I'm currently thinking about how to best approach things... I might close the PR and re-open a new one once I have something that 100% works. E.g. I thought I was going to be able to change some of the XML format, thinking it was a bit freeform and I'm now learning about the different XML schemas for unit testing 😅 Still not sure which one Pkl is using and/or if I can change it to add more of the information I'd like to have, etc. Anyway, TL;DR no rush :) Thanks!! |
7e2e13e
to
256e3de
Compare
OK, I think I finally got it done :) I've refactored a bunch of stuff and fixed my original commit, here's the summary:
Here's some outputs that show these new features:
|
Ah, so I don't forget... I've been trying to find out if there's an XSD to validate the XML produced by I checked both and Maven Surefire is the closest to the XML JUnitReport produces... however, it does not validate against the XSD (many elements are missing required attributes). Is there a tool that loads the XML report that could serve as a validator of sorts? :-? |
I've tried many times to look for a schema, but as far as I can tell, there's no real schema for JUnit reports. And, the many tools that create JUnit reports all differ in minor ways. Tools like Jenkins that accept JUnit-style reports tend to just make a best-effort attempt to parse them, so, we just do the best we can to be conformant to what's out there. |
256e3de
to
7b445bf
Compare
Arrrg CI failed because of linting! I could have sworn I did a final |
7b445bf
to
4a06e5a
Compare
OK, I've just pushed the resolved merge conflict... it passes all the same tests. @bioball is there any way to ensure the commit is not blocked in CI? I've seen it blocked ("on hold") for ~a week before :-/ Is there any reason why that happens? :-? Thanks!! |
Tests failed for EDIT: ah, I think the error is the following but I'm not too sure: I usually double-check the reports like |
Regarding JUnit XML reports; this As for the CI failure;
|
4a06e5a
to
dacc050
Compare
I've rebased the branch and fixed the conflicts. @holzensp thanks for the trace! I'll have a look and see if I can spot what's going on in there 😄 |
dacc050
to
8e0aa94
Compare
OK, I think I've fixed the tests in Windows. I've tested it in a Windows 11 VM, I think the issue was with the Hopefully it'll all run fine now! |
8432d38
to
1f53fe2
Compare
Wow, I don't know WTH is going on with the commits! I've been trying to update the branch and I think I've messed it up... 😬 gonna try & clean it 😓 EDIT: ah, looks like things are fine now, those commits are in |
1f53fe2
to
55ba105
Compare
Will take a look this week! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Sorry it took so long for me to take a look at this.
I see what you're doing here, but I'm concerned that this output overall adds too much noise and is less helpful.
Here is a sample test report from your changes:
module test ❌ 50.0% pass [1 passed, 1 failed] (/Users/danielchao/code/apple/pkl/.dan-scripts/test.pkl:1)
facts ❌ 50.0% pass [1 passed, 1 failed]
math ❌ 50.0% pass [1 passed, 1 failed]
1 == 2 ❌ (/Users/danielchao/code/apple/pkl/.dan-scripts/test.pkl:5)
double ✅ 100.0% pass [2 passed]
Some thoughts about this output:
- There's too many emojis; we don't need to show ❌ and ✅ next to the module name or the test name
- It's confusing that the numbers don't add up. How come the overall module tells me 1 passed and 1 failed, but at the bottom it tells me 2?
I'm thinking that this output would be much nicer:
module test (/Users/danielchao/code/apple/pkl/.dan-scripts/test.pkl:1)
facts
math ❌
1 == 2 (/Users/danielchao/code/apple/pkl/.dan-scripts/test.pkl:5)
double ✅
1 passed, 1 failed (50%)
What I'm thinking:
- The number of tests is the number of fact/example entries (the things that are named).
- We show summary at the bottom of the test report (instead of next to each item).
- We only show ❌ or ✅ next to the test.
BTW, the ✅ emoji is kind of intense. I wonder if we should switch to the ✘ and ✔ characters instead, with ANSI color coding. But this will require the error coloring branch (https://github.com/apple/pkl/tree/error-coloring) to be merged.
7899903
to
ffb364f
Compare
Thanks for the review!
Agreed, it can definitely be improved!
I was thinking that it would be good to know what parts pass and what parts fail, and to "get it all green", but yeah, it can be a bit too much.
Yeah, it's tests vs asserts in the tests. I agree, it's confusing.
That's much better :) I still think it would be nice to know how many asserts pass & fail as wel. Maybe there could be a "verbose flag" that shows those... dunno, we'll see!
Agreed, that's a good plan :) I'll give it a try!
For sure 😅 Once that's merged I agree it would probably be better to not "hardcode the colors via emojis" and use the other UTF characters + ANSI coloring. Do you have an ETA for that code merging? Alright, I'll give this PR another spin and address your comments! |
ffb364f
to
45738dd
Compare
@bioball I've pushed a bunch of changes that I think will addressed all your comments. The only additional thing is that I've also added the assertion count to the the summary line. Let me know what you think! Here's the output of a "real life project" I'm working on with a bunch of "forced failures" to see how it all would look like:
Thanks! |
@bioball what do you think about the last output I pasted? Is it now OK (re. your comment about the output having noise and maybe not being helpful)? Thanks! |
Apologies for dropping the ball here! It's not an excuse, but the LSP work has sucked all my attention. I think your revised output looks pretty good! There's some merge conflicts to deal with; apologies for that. Can you address those? Once you do, I'll review this again and let's work towards getting this merged 😁 |
@bioball no worries :) I'll go over the merge conflicts and will update the PR. Thanks a ton! |
b8d27d9
to
c74d92a
Compare
@bioball OK, I finally managed to rebase my branch against latest Please review the branch when you have a minute and let me know if there's anything I should change! |
c74d92a
to
14a5fb8
Compare
OK, I've added back an extra line at the end of the modules but only if there's more than one module in the test. Now, the output matches the previous output but I don't have to modify the expected test outputs. Here's the "new look", like the last one I pasted but I've removed an additional
And this is how it looks with a module-level error:
|
14a5fb8
to
4484bef
Compare
* TestRunner: separated the running of each "section" of a test into its own try-catch so that we can evaluate each section separately even if another section failed (e.g. evaluate all examples even if one test in "facts" throws an exception). I've added tests that exercise this "new feature". * TestResults: refactored it a bit so that TestResults now has the three sections and the old TestResults becomes TestSectionResults. Also, TestSectionResults now has only *one* error, since that's what we get when there's an exception evaluating the Pkl code. * SimpleReport: * Separated reporting of each test section so that it's easy to see which facts or which examples fail. * Added a "stats line" to the module level that reports how many tests and assertions pass/fail. Note that examples are equivalent to "one test with an 'all-or-nothing assertion' so they are counted as 1 test with 1 assertion. Similarly errors will also count as "one test with one assertion failed" because, at the moment, individual errors are not being surfaced individually. Instead, only the first error is being surfaced and thus it masks the rest of the tests and assertions in the test section where it happens. * For the examples converted to tests, when multiple examples share the same name I've also added a counter ("# 0" and so on) to disambiguate them so that it's easier to identify which one is failing. * JUnitReport: fixed the reporting of failures as it wasn't consistent with "tests": the tests were the number of tests that we run but "failures" were the number of assertions that failed.
4484bef
to
25fd96e
Compare
I've rebased the latest @bioball There's "one change requested" re. the output of the tests and I believe I've addressed it. Can you mark it as resolved so merging is not blocked? Also, @bioball or @holzensp, can you unblock CI? The job is on hold, I believe the CI jobs don't run unless authorized? :-? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thank you for the PR! These are definitely improvements.
FYI: I'm going to submit a follow-up PR to refine this a little bit more. But I think this is good enough to go in for now.
@bioball cool, thanks! 😄 |
Follow-up PR: #738 |
I've commented in the new PR, IMHO there's a big change that removes a feature that I added in this PR. Let's discuss in there! |
Add some basic test stats to the test runner output for both SimpleReport and JUnitReport: