-
Notifications
You must be signed in to change notification settings - Fork 5
Some (Engineering‐Oriented) Thoughts On Testing
Chris likes to ask 3 questions about completed software:
- Does it work?
- How do you know?
- How do I know?
Those are great questions and a good start but nowhere in there is it stated what it means to "work".
Everyone has some notion in their head what that means, but like any engineering activity, it's full of technical details. One useful activity is to take a 20 line function in code and ask ChatGPT for a list of all the ways that function could be implemented wrong or with errors. Just the categories. You should get about 8 to 10. For example:
- The logic could be wrong.
- The math could be wrong.
- It could might not handle bad input parameters properly.
- There could be edge cases outside of the happy path.
- It might not handle certain combinations of input parameters correctly.
- There could be memory leaks.
- It could work the first time it's called but not the second time because of some accidental persistent state.
- It might not handle errors effectively.
All of these should probably be tested. Most of these would need more than 1 test. And even when you have all of these tests created, there are elements of good and useful tests vs "just tests".
- Any test that fails ought to clearly tell the user or test executor which test failed and why. A "test failed" message isn't really very actionable. A lot of python unittest tests use asserts which can produce error messages "True is not False" if one isn't more deliberate.
- The test should probably have good doc strings to explain the intention and the design so they can be maintained long term.
- In theory, one should prove that, for each test, the test can fail if the code is wrong (testing the tests!).
For any set of tests, someone coming to them ought to know not just that "tests" exist but what the extent of functionality and code coverage and failure modes are covered by the tests at any moment in time. One can have a set of tests that looks very extensive but might include a lot of duplication or only focus on one area and users might look and say "Oh look; there are tests". But really there might be more of an absence of tests than a presence. Some might argue that no tests are actually better than 10% of test, because latter creates a false sense of confidence.
Also, tests need to not be overspecific or overly constrain the code. They should test that the code does what it needs to do not just prove that it does what it was doing at the time the test was written. There is value to "change-detection tests" but there is a real cost to those too, especially as they add up. We had lots of experience of this with the EMOD "Regression Test" suite.
This doesn't even get into the question of statistical feature testing, matching tests to requirements and desirements, or differences between unit tests and integration tests.