Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance documentation of requirements coverage #350

Open
nasajoey opened this issue Nov 16, 2023 · 5 comments
Open

Enhance documentation of requirements coverage #350

nasajoey opened this issue Nov 16, 2023 · 5 comments
Labels
automated-testing Related to automated testing tools enhancement New feature or request

Comments

@nasajoey
Copy link
Contributor

For the requirements coverage markdown doc here: https://github.com/interuss/monitoring/blob/main/monitoring/uss_qualifier/suites/astm/utm/f3548_21.md it would be valuable to see ALL requirements from the spec and see which are "Implemented" and which are "Not Implemented". Other values for that column might be "Won't test" or "Won't implement" or something, with some reasoning in the right column.

This may provide folks with a better sense of how "complete" the test suite is and maybe where they can apply effort to increase coverage by focusing on the "Not implemented" requirements.

We have been doing such an analysis separate from the repo just in a spreadsheet. However, getting everything in a public area, completely consolidated would be valuable.

@nasajoey nasajoey added the enhancement New feature or request label Nov 16, 2023
@BenjaminPelletier
Copy link
Member

The reason we haven't done this already is that it's often/usually difficult to enumerate "ALL requirements". The F3548 suite is potentially an exception to this as there are a very clear set of requirements enumerated for F3548, but we probably don't want to imply that this test suite is intending to test all those requirements, and that's what would be implied by including literally all requirements defined in F3548. Perhaps more clearly, F3411 has requirements for broadcast RID and requirements for network RID. Our F3411 suite is only targeted at network, so we wouldn't want to have "ALL requirements" include broadcast RID requirements.

It certainly would be possible to explicitly enumerate the set of requirements the suite is...something..."attempting to cover"? "in scope"? The auto-generated test suite documentation could then report potential testing status of that explicitly-enumerated set of requirements. But, to do this, I think we'd want to clearly define what the "something" is that set of requirements represents. It seems like you wouldn't suggest the answer is "attempting to cover" since that would mean "won't test" wouldn't be a valid status. We would also want to pick a definition that allows additional requirements to be included in the documentation that are needed for the suite but not specifically targeted -- like the InterUSS NetRID telemetry injection requirements.

The challenge with "won't test" is that I don't really know a good definition. For instance, GEN0015 says USSs need to operate under a quality management system equivalent to ISO 9001. Today, that's probably a "won't test". But, if a queryable online certification registry became available, that requirement would suddenly become testable and we would probably want to test it. Also, this status would presumably be part of the repo, so InterUSS as an organization would effectively be saying "won't test". But, I don't think we'd ever want to say that -- if someone is interested in testing a requirement and identifies a feasible way to do so, it's pretty likely we'd want to incorporate that, so "won't test" isn't really accurate. It could be "won't test given our current level of resourcing and knowledge", but saying that would essentially require all potential InterUSS contributors to agree that they wouldn't implement, so it isn't really feasible for individuals like the ones on the InterUSS TSC to realistically make that statement.

So, the current approach we've taken in the test suite documentation is to only document positive test statuses: we indicate requirements that could be/are tested, and requirements we know how we'd likely test but haven't implemented yet ("TODO" status). Since it's hard to enumerate what we "should" be testing, we've omitted that from the current test suite documentation. However, that information is available; see next paragraphs :)

From a higher level, I do certainly agree there is a strong "are we done yet?" use case. Many different parties want to know what the current test coverage is, what is still left to do, and perhaps what won't be covered even when we're "done". Some of us (especially @Brendan-Hillis) have spent a fair amount of time trying to wrestle with this need and how to address it (and similar needs) -- our first attempt was "participant verifiable capabilities". Each capability defined in a test suite identified a set of criteria that needed to be satisfied in order to verify that capability, usually "pass checks for all these requirements" and "verify this sub-capability". We defined these capabilities extensively for U-space and NetRID (our initial focus was complete verification of U-space Network Identification Service requirements via ASTM F3411-22a) and a report browsing these capabilities evaluated for a CI test run can be found here. The problem with that approach is that it is really hard to determine how to "roll up" observations of a test run into whether that capability was "verified" or not. For instance, ASTM F3548-21 SCD0055 requires a USS to do certain things for conflicts between two operational intents of the same priority when regulations allow conflicts at that priority level. However, there are jurisdictions that do not allow conflicts between same-priority operational intents at any priority level, so InterUSS would be unable to produce any test checks for SCD0055 in this jurisdiction. Presumably we would want the F3548 suite to verify the "strategic conflict detection" service in both jurisdictions with same-priority-conflicts-allowed and jurisdictions without same-priority-conflicts-allowed, but it's not clear how one would defined a single "strategic conflict detection service" capability that would work for both those use cases.

The approach we're currently exploring removes InterUSS from the business of identifying "ALL requirements" entirely. Instead, the uss_qualifier user defines "ALL requirements" in the test configuration they use and uss_qualifier will produce a "tested requirements" artifact from a test run. A good example is of this artifact is uss1 from the U-space test suite in the CI (there is a report for just F3548, but it intentionally skips part of the suite currently). This is what I expect to use to answer "are we done yet?".
Screenshot 2023-11-16 at 2 21 19 PM
Screenshot 2023-11-16 at 2 22 14 PM
"Done" will be defined by a set of requirements defined for a specific test configuration designed to cover them (eventually). One big advantage is that this allows there to be multiple definitions of "done" for different use cases (e.g., whether SCD0055 needs to be verified or not).

In terms of plans (e.g., "we intend to implement this"), this seems to be best tracked like other feature requests -- e.g., issue for F3411 and issue for F3548. This allows us to capture more nuance beyond "we intend to implement this" to, e.g., "the priority of this is P1 and person X is working on it as part of project Y which is probably targeted for circa 3 months".

We're certainly open to other suggestions, but that's a rather long-winded answer to why we don't have more coverage-like information for test suite documentation.

@nasajoey
Copy link
Contributor Author

Thanks for the detailed thoughts, Ben.

I think this definitely will be a thing. NASA will have thoughts and an approach to share most likely. With holidays it may take time to get folks organized to discuss. The most important audience for this kind of information I think is a regulator who will want to know what InterUSS tests cover as well as what they do not, in respect to the standard. For example if the standard ends up being a method/means of compliance for something, verification of the implementation against the standard is vital. We'll want to be sure that the regulator doesn't have to infer what is NOT covered in the tests.

I'm not trying to argue that every F3548 requirement is supposed to have a test in InterUSS, just that we should enable all stakeholders to fully understand its scope relative to the standard.

I'm sure we'll chat more about this. I propose leaving this issue open until we converge a bit more on the topic, but it may be a little laggy given the holidays. We may converge via some kind of meeting and not just through these message here (would be hard to do so I that way I think) and if that's the case we can summarize on this ticket. Sound reasonable?

@BenjaminPelletier
Copy link
Member

Definitely agreed -- I'll look forward to future discussion in appropriate venues and summarization on this ticket as appropriate.

@BenjaminPelletier BenjaminPelletier added the automated-testing Related to automated testing tools label Nov 17, 2023
@BenjaminPelletier
Copy link
Member

Just to note this following today's separate discussion: a distinct advantage of using test reports to evaluate requirements coverage versus the test suite documentation as originally mentioned in this thread is that the report is capable of differentiating which checks were actually made while the test suite is only capable of listing the checks that could be made.

When a check cannot be performed because of the conditions during a test run do not make it possible (e.g., a USS uses a subscription strategy that results in it having a pre-existing subscription such that a GET request is not needed, so therefore a check that the GET request was made cannot be performed and must be skipped), the test report will capture that outcome whereas the test suite documentation will not.

@nasajoey
Copy link
Contributor Author

Test reports are great. This thread is about how we provide confidence and clarity to stakeholders that requirements are appropriately covered by scenarios/cases/steps. The existence of a single test scenario for a given requirement does not, on its own, let everyone know that the requirement is sufficiently covered. So a report that says that single test scenario's cases/steps have passed (while great) is not sufficient on its own to let everyone know that the requirement is sufficiently covered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automated-testing Related to automated testing tools enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants