Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WrappedCandidateMatcher for composing matchers #13109

Conversation

bjacobowitz
Copy link
Contributor

Description

Create a WrappedCandidateMatcher, along with a factory for creating these objects and a default implementation of the factory, to assist users in creating their own composite matchers.

Problem

The current access protections on CandidateMatcher make it infeasible for a user of the Lucene Monitor library to create their own composite CandidateMatcher (for example, an alternate version of ParallelMatcher) on top of existing CandidateMatcher instances if they are outside the package.

As an example, the existing implementation of ParallelMatcher makes use of delegate CandidateMatcher instances, but a user would not be able to write their own version of this in their own package, because matchQuery(), finish() and reportError() all have package-private access (see this gist, comments labeled "PROBLEM").

Solution

The simplest solution to this problem would be to increase the visibility of those functions in CandidateMatcher (either to protected or public), but it runs the risk of violating encapsulation, so I've tried to avoid that here.

The solution I've put together here introduces a WrappedCandidateMatcher, which can be used to increase visibility when creating a composite matcher. WrappedCandidateMatcher does NOT extend from CandidateMatcher but has nearly the same interface, with increased visibility on matchQuery(), finish() and reportError(). A user building a new CandidateMatcher by composing existing CandidateMatcher instances could make full use of those CandidateMatchers by wrapping them with WrappedCandidateMatcher, allowing them to be used internally to the composite matcher (see this gist, comments labeled "SOLUTION").

On a certain level, this solution also violates existing encapsulation, but it requires the user to jump through a few hoops to get there, so it would limit the potential for misuse of a CandidateMatcher. If we would instead prefer to increase the visibility on CandidateMatcher, let me know and I can do that instead.

Merge Request

If this PR gets merged, can you please use my [email protected] email address for the squash+merge. Thank you.

Copy link

github-actions bot commented Mar 1, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 1, 2024
@cpoerschke
Copy link
Contributor

Thanks @bjacobowitz for a very detailed pull request here!

I'm not really familiar with this area of the code but intuitively would be curious what the alternative (you mentioned) of increasing the visibility on CandidateMatcher would look like.

@romseygeek
Copy link
Contributor

The protected visibility on matchQuery() should already be fine here - you can override or call protected methods from within subclasses. I think making reportError() and finish() protected final would be the neatest solution.

@github-actions github-actions bot removed the Stale label Mar 2, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Mar 16, 2024
@bjacobowitz
Copy link
Contributor Author

The protected visibility on matchQuery() should already be fine here - you can override or call protected methods from within subclasses. I think making reportError() and finish() protected final would be the neatest solution.

@romseygeek On further reflection, I think protected visibility may not be sufficient for a practical solution here, even when using subclasses. I'll again use ParallelMatcher as an example, since I think it's a fairly representative use case.

ParallelMatcher takes in a MatcherFactory to create instances of CandidateMatcher, with the intent to take advantage of the matchQuery implementation in the factory's generated instances of CandidateMatcher. If we try to reimplement ParallelMatcher outside the Lucene Monitor package, we can't call matchQuery on the factory-built CandidateMatcher from outside (i.e. on an instance other than this) because of matchQuery's protected access -- the only reason the existing ParallelMatcher implementation can do this is because it is in the Lucene Monitor and therefore has package-local access to the protected functions on another object in the same package.

To access and invoke the implementation of matchQuery, we would need to create a subclass of the concrete CandidateMatcher implementation generated by the factory, which isn't directly possible, because the factory pattern creates a situation where the concrete class isn't known.

We could approach a solution using an alternate factory implementation that would return a subclass of a known concrete type with hidden functions exposed. For example, we could create a subclass of CollectingMatcher (let's call it ExposedCollectingMatcher) with a public function exposedMatchQuery which calls this.matchQuery(), and then we could create a factory to return that subclass (it might be called ExposedCollectingMatcherFactory).

The problem with that approach is that it breaks down in cases where the concrete implementation is not available to be subclassed, as is the case for several anonymous CandidateMatcher implementations (e.g. instances generated by MATCHER factory in ExplainingMatch).

Ideally we would like a factory that returns a more generic class that extends the functionality of a CandidateMatcher, which is what WrappedCandidateMatcher provides.

Of course another solution would be to simply make these functions public, but I understand the desire to avoid that if possible.

@github-actions github-actions bot removed the Stale label Mar 26, 2024
Copy link

github-actions bot commented Apr 9, 2024

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 9, 2024
@cpoerschke
Copy link
Contributor

... Of course another solution would be to simply make these functions public, but I understand the desire to avoid that if possible.

Again, I'm not really familiar with this area of the code but would be curious what making the functions public would look like and if making them public is necessarily problematic.

And echo-ing @mkhludnev's apache/solr#2315 (comment) comment w.r.t. just-compile test to retain extensibility in future.

@github-actions github-actions bot removed the Stale label Apr 11, 2024
Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Apr 25, 2024
@bjacobowitz
Copy link
Contributor Author

@romseygeek I'm wondering if maybe we should make those functions protected final as you suggest, but also make some of the CandidateMatcher implementations public.

Right now, outside of the org.apache.lucene.monitor package, there is no basic concrete implementation of CandidateMatcher that is publicly availble to be subclassed. Today we have the following CandidateMatcher implementations, and none of the basic ones are public:

If we're thinking about making CandidateMatcher functions protected to give subclasses have access to them, I think we would need to make some of the basic implementations of CandidateMatcher public, so that they're available to be extended outside the org.apache.lucene.monitor package.

Again I'll use ParallelMatcher as an example. First we make the reportError and finish functions protected final in CandidateMatcher as you suggested. If we then expose a basic implementation of CandidateMatcher (I've extracted the CandidateMatcher returned by QueryMatch.SIMPLE_MATCHER, calling it SimpleCandidateMatcher), then outside the package we can create a subclass of that (I'm calling it CandidateMatcherExtension here) with public functions that call the now-protected functions of the superclass. That gives our ParallelMatcher implementation the ability to make full use of the basic candidate matcher through the subclass (see the comments labeled "SOLUTION").

I think this would solve this issue without requiring us to dramatically change function permissions. The only additional requirement would be to expose some implementing CandidateMatcher classes, but I think that might be okay (we've already exposed ParallelMatcher and PartitionMatcher). Should I go ahead with this approach?

@romseygeek
Copy link
Contributor

Hi @bjacobowitz, thanks for the detailed update! I think this would be easier to reason about if we had some concrete examples. Do you think you could post some code of composite matcher implementation that you would like to implement? Doesn't even need to compile, but just to give me an idea of precisely what you're trying to achieve and how the current library code gets in the way of that.

@github-actions github-actions bot removed the Stale label Jul 25, 2024
@bjacobowitz
Copy link
Contributor Author

Sure. It's roughly the same approach as ParallelMatcher, but with a reactive setup instead of an ExecutorService.

monitor.match(Document, MatcherFactory) runs a bunch of queries with a CandidateMatcher from MatcherFactory and returns MatchingQueries, but it won't return results until all the queries have run. I am trying to create a custom CandidateMatcher (and a factory to wrap it) that delegates to an existing matcher to do the matching work, but puts its results into a Flux, so that I can start processing the individual results as soon as they are available.

It's something sort of like this:

package com.my.custom.package;

public class MyReactiveMatcher<T extends QueryMatch> extends CandidateMatcher<T> {

  public static class MatchTask {
    final String queryId;
    final Query matchQuery;
    final Map<String, String> metadata;
    public MatchTask(String queryId, Query matchQuery, Map<String, String> metadata) {
      this.queryId = queryId;
      this.matchQuery = matchQuery;
      this.metadata = metadata;
    }
  }
  
  private final List<MatchTask> tasks = new ArrayList<>();
  
  // Factory of CandidateMatcher to which we will delegate matching
  private final MatcherFactory<T> factory;
  
  public MyReactiveMatcher(IndexSearcher searcher, MatcherFactory<T> factory) {
    super(searcher);
    this.factory = factory;
  }

  @Override
  public void matchQuery(String queryId, Query matchQuery, Map<String, String> metadata) {
    // Register a query to try to match later
    tasks.add(new MatchTask(queryId, matchQuery, metadata));
  }
  
  // Run matchQuery
  private static <T extends QueryMatch> MultiMatchingQueries<U> getResult(
      MatchTask task, CandidateMatcher<T> matcher) {
    try {
      // This call is not possible outside the lucene.monitor package:
      matcher.matchQuery(task.queryId, task.matchQuery, task.metadata);
    } catch (IOException e) {
      // This call is not possible outside the lucene.monitor package:
      matcher.reportError(task.queryId, e);
    }
    // This call is not possible outside the lucene.monitor package:
    return matcher.finish(0, 0);
  }

  @Override
  protected void doFinish() {
    // Kick off matching, with results coming back through the Flux as they are finished
    Flux.fromIterable(tasks).flatMap(t -> getResult(t, factory.createMatcher(searcher))).doOnNext(::aBunchOfStuff).subscribe();
  }

The getResult function is where the issue comes up. I can't really make use of the CandidateMatcher generated by the factory if I'm outside of the lucene.monitor package, due to the access protections in place.

To use your suggestion of a subclass, I could create a MyReactiveMatcher with a factory that returns my custom subclass of CandidateMatcher (then I could just cast to it and call extra its functions), but to be able to create that custom subclass in the first place I would need to extend a concrete implementation of CandidateMatcher, and there really aren't any available at the moment.

@romseygeek
Copy link
Contributor

Thanks, that's very helpful. Given that composite matchers need to call those methods for CandidateMatcher to be useful, I think the simplest way forward here is to make matchQuery(), reportError() and finish() public. Do you want to open another PR for that @bjacobowitz?

@bjacobowitz
Copy link
Contributor Author

Great, I'll open another PR shortly with those changes (matchQuery(), reportError() and finish() public), and will reference this one for context. Thanks!

@bjacobowitz
Copy link
Contributor Author

Closing, see PR #13632 with new approach

@bjacobowitz bjacobowitz closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants