Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support chained runs of phases, plus with progressive visibility of prior SVRL #16

Open
rjelliffe opened this issue Jul 14, 2021 · 0 comments
Labels
enhancement Adds new capabilities phases Issues relating to phases

Comments

@rjelliffe
Copy link
Member

rjelliffe commented Jul 14, 2021

Currently, to move from one phase to another requires external logic. A simple mechanism inside Schematron could help this.

This proposal accompanies #14 and #15 but is independent of them.

1) Chained running of phases

The proposal is to add to sch:phase some attributes which nominate a phase to be run after this has finished. The SVRL results of the current and prior phase are available in a global variable for that next phase.

An attribute is added sch:phase/@and which is an XPath expression retuning a string (the phase id) or empty or false() or empty string etc.

 <sch:phase id-"p1"  and=" 'p2'"> ....</sch:phase>

 <sch:phase id="p2">...</sch:phase>

Once all the patterns in p1 are attempted, then the phase with id 'p2' is attempted.

This allows an orderly tree of phases to be run. Because the attribute is an Xpath, you can use if..then...else to have multiple branches, based on information in the main document, global variables and input params etc.

Note that there is no order dependency added as to the order which phases and patterns etc should be run: merely that when one phase is performed, other phases will also be active. This could be implemented by e.g. first running all the patterns in one phase, then running all the patterns in the next phase that were not in the first phase; or it could be run finding the closure of all phases to be activated, then running all those in an undefined order (including simultaneous evaluation).

Consequently, if two phases have the same active pattern then that pattern is only activated once. It is an error if two phases activate the same pattern have variables with the same name because only one pattern is activated (this is more strict than minimal, but is readily doable.)

@and would only name a single phase, not multiple, for simplicity. (Now it could return a list of phase ids, I suppose: but then it gets away from the idea of a phase being a notionally discrete state. I am not convinced that there is a need to go beyond the Hidden Markov Model or finite-state-machine -type constraint that each state "transition" only requires information on that state to make a single transition. I think it would complicate some implementations, if not the code, at least the thought needed to implement. )

2) Chained running of phases with order

This is an extension that requires 1). It using the same markup as #14. It lets you run a sequence of phases, directed by the markup.

A phase can have a sch:phase/@Do. This is a barrier. It means that all patterns nominated in prior phases must have completed (logically, not necessary temporily, if there is some "lazy" or JIT implementation involved.)

If a previous phase has already activated a pattern, there is no need to reevaluate it. The same error as in 1) about phase variables with the same name applies, I expect.

<sch:phase id-"p1" and=" 'p2'"> ....</sch:phase>

 <sch:phase id="p2"  do="next">...</sch:phase>

The simplest way to implement this is that if there is any sch:phase with @Do=next, the implementation only evaluates any @and attributes after the phase it belongs to has completed, then runs that regardless of any @Do on that next sch:phase.

A better way to implement it is that the @and is evaluated at the start of the phase, then if the phase has no @Do its patterns can be merged, but if it does then they are trimmed and queued.

For example, if your input may be of several different dialects of your XML which required different treatment, your entry phase may be null and select specific phases for each different do

<sch:phase id="start"  and="if (/xbrl:xbrl) then 'input-as-xbrl" else 'input-as-html'" />
<sch:phase id="input-as-html">....</sch:phase>
<sch:phase id="input-as-xbrl">....</sch:phase>

This draws out the potential of phases to support multi-revision schema lifecycles without external logic.

3 ) Chained phases with progressive visibility of SVRL

This is an extension of 2) and uses the same markup or concept as #15 but applying to phases.

So this is the case where @Do is used, so there is a barrier. In this case, we make an automatic variable available, e.g. SVRL_PROGRESSIVE or whatever (similar to #15) which provides the cumulative SVRL of the previous phases.

This has two uses:

A) The asserts, reports, variables, rules etc of patterns in the chained phases after the barrier can see the cululative SVRL results of the previous phases. So you can use one phase to mark the next.

B) The cumulative SVRL result (i.e. that automatic variable) is also available in the @and XPath, so you can decide which next phase to run based on the validation results of the current and previous phases.
So, for example,
<sch:phase id="p1"
and="if ($SVRL_PROGRESSIVE//svrl:flag-raised[@id="DISASTER"]) then 'disaster-phase' else 'cool-phase' ">
<sch:active ...
</sch:phase>

<sch:phase id="disaster-phase">....<sch:phase>

<sch:phase id="cool-phase">...sch:phase

If there is no sch:phase/@Do, then there is no logical evaluation constraint on the evaluation of phases, and therefore it would be an error for any Xpath to contain string $SVRL-PROGRESSIVE if there is no sch:phase/@Do=next (or sch:pattern/@Do=next from #14 for that matter).

Option. Speed: extend the internal SVRL location using generate-id() id for faster lookup.

The above suggestions allow external annotations of nodes using the SVRL. One use case might be where an assertion wants to test a node based on properties found for that node in the progressive SVRL made by logically previous phases and patterns. But this seems difficult, because it needs to do a complete search and match of the SVRL, perhaps by generating a canonical XPath and matching it against all the location XPaths in the SVRL.

Now a workaround would be to use the Xpaths locations in the SVRL as keys, but it is still work.

So what would be better is if every SVRL entry also had an attribute with the generate-id() value. Because we are not changing the document itself, this would allow faster lookup. (It would also be useful for performance if the we could make a key into the SVRL using these values, for fast lookup, as long as this was generated lazily: so when we are at a node and use generate-id(), as a key, the hash table is only populalated with the SVRL keys up to that id, etc. )

This id should be stripped from the SVRL that goes out, as generate-id() is not stable between runs.

It may be that the hash table should also be provided as an automatic variable (I guess, an automatic key?) so that it does not get re-calculated to often. Anyway, I think there is scope there for something in this case...

(I am not working on an implementation of this idea.)


N.B. On Quickfix There might also be some nice interaction with XML QuickFix too: for example in one phase to check that the fixes of a previous phase went far enough, and fix the fixes if not. (Though maybe some smart looping to repeatedly run the same validation/fix combo over a document until there were no more fix actions performed might be something too: I don't know that Schematron needs to be extended for this, though: it would be done by the runner.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Adds new capabilities phases Issues relating to phases
Projects
None yet
Development

No branches or pull requests

3 participants