Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix new range index field report #243

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

amclark42
Copy link

Fixes #191. Tested on eXist v6.2.0 using previously-created range indexes on elements, attributes, and fields (also, Lucene indexes by node).

This PR ensures that Monex can report on fields in the new range index, and that the report is accurate across a collection.

Monex doesn't have a develop branch. Maintainers, please feel free to let me know if I should adjust this PR at all. Does the app need a changelog entry or a version increase, for instance?

The global variable is still useful without the context of
`$indexes:collection`, so it's left as a test of what eXist can do
with the new range index.
If the new range index is used, `indexes:show-index-keys()` checks the
output of `range:index-keys-for-field()` for a mismatch in the number of
terms, and more terms than asked for in `$indexes:max-number-returned`.
If the range function is acting on a document-by-document basis, Monex
cleans up the output for accuracy across the collection.

Also made it easier to tell what's going on when processing `$sorted-keys`.
@joewiz
Copy link
Member

joewiz commented Nov 23, 2023

@amclark42 Thank you for the PR! It is correct in targeting the "master" branch.

While the CI tests are failing, the issues causing those failures preceded your PR are entirely unrelated to it. I checked out your branch tonight and installed it, and in fact the tests intermittently pass; I hope to get to the bottom of that, separate from this PR.

On the topic of sample data to confirm that the Indexes panel is behaving as expected, I had forgotten that we already have a basis for this: the "indexes-test" subcollection, which can be browsed at http://localhost:8080/exist/apps/monex/collection.html?collection=/db/apps/monex/indexes-test and whose source is https://github.com/eXist-db/monex/blob/master/src/main/xar-resources/indexes-test. You'll see a sample index configurations with a range index for a few cases of elements and attributes with and without namespaces. Would you consider adding some sample data & associated index configurations that, when browsed in Monex's Indexes panel, demonstrates the functionality you added?

@amclark42
Copy link
Author

@joewiz Absolutely! Thanks for pointing to the test indexes; that'll help a good deal.

Thanks also for investigating the Cypress tests! I tried to get them running locally, but couldn't figure out how to get my environment set up for them. Is there any documentation you can share on how the eXist devs run the tests locally?

@amclark42
Copy link
Author

I'm still working on pulling together a good test dataset, but I figure I can lay out some of the criteria that I'm trying to capture.

When I tested Monex with my organization's data, here's what I looked for:

  • The range field report returns data.
    • This is the issue identified in my bug report.
    • This can be tested with a minimal sample, as the existing index tests have done.
  • The range field report does not repeat terms across rows.
    • This is an issue that came up as part of my fix. The range function returns one row per term per document, not compiled across the corpus.
    • To test this, we need more than one XML document.
    • The test corpus should have at least some values repeated across documents.
  • The range field report honors the number of items to be returned.
    • This also arose as part of my fix. Because the range function works document-by-document, it capped the number of terms per document, not across the corpus.
    • I suspect that, with my fix in place, Monex may report a lower-than-actual frequency for some terms, because the range function would not have reported all uses of the term for all documents. I was not able to test this suspicion, but I will be able to with a constrained corpus.
      • It's worth noting that, if my suspicions are correct, that's fuel for a new bug report on eXist itself and the implemented range function.
      • From Monex's point of view, it should be able to compile the per-document results and return no more table rows than were requested. Having an accurate frequency would be ideal but it was not as pressing as getting useful results at all.
    • Ideally, to test this, we'd have more than 100 terms in one or more documents.

@amclark42
Copy link
Author

With commit d7ae9ca, I've added three new files to Monex's collection of test data. These files record information about the XPath functions available in the functions specs (2.0, 3.0, 3.1), e.g.:

<function fn="idref" group="sequence">
  <return quantifier="*">node()</return>
  <property>deterministic</property>
  <args num="1">
    <property>context-dependent</property>
    <property>focus-dependent</property>
    <error>XPDY0002</error>
    <error>XPTY0004</error>
  </args>
  <args num="2">
    <property>context-independent</property>
    <property>focus-independent</property>
  </args>
  <error>FODC0001</error>
</function>

To demonstrate fields in the new range index, I've added the following to Monex's collection.xconf:

  <create qname="function">
      <field name="function-name" match="@fn" type="xs:string"/>
      <field name="function-group" match="@group" type="xs:string"/>
  </create>
  <create qname="return">
      <field name="return-type" type="xs:string"/>
      <field name="return-quantifier" match="@quantifier" type="xs:string"/>
  </create>
  <create qname="property" type="xs:string"/>
  <create qname="error" type="xs:string"/>

Of the new indexes, only function-name has more than 100 keys. I haven't yet done a thorough check to make sure that the numbers reported are accurate.

With the new data and indexes in place, I'm satisfied that all of the criteria I listed in my last comment have been met. It'd probably be useful to add new tests to the Cypress suite so that there's a backup to an eyeball test. Since I'm not able to run tests locally, I'll have to leave that for someone else.

@amclark42
Copy link
Author

I added language to each data file that releases it into the public domain. This way I and others can use the dataset for other purposes. Let me know if this is a problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Index for new range field doesn’t show keys [BUG]
2 participants