Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup #2698

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

Conversation

lukasz-soszynski-eliatra

Description

[Describe what this change achieves]
Implementation of the new lookup command.

Issues Resolved

[List any issues this PR will resolve]
#2651

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@rupal-bq
Copy link
Contributor

rupal-bq commented Jun 7, 2024

Is this ready for review now? Can you please rebase with latest main to fix CI?

@salyh
Copy link

salyh commented Jun 12, 2024

@rupal-bq I will rebase.

@rupal-bq
Copy link
Contributor

Can you please add reference manual for lookup command? (e.g. https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/dedup.rst)

@salyh salyh force-pushed the lookup branch 2 times, most recently from e7c8535 to d4f88c3 Compare June 13, 2024 09:33
@salyh
Copy link

salyh commented Jun 13, 2024

I rebased and squashed the commits and fixed DCO

@lukasz-soszynski-eliatra lukasz-soszynski-eliatra marked this pull request as ready for review June 18, 2024 08:16
@salyh
Copy link

salyh commented Jun 25, 2024

Can you please add reference manual for lookup command? (e.g. https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/dedup.rst)

done

@rupal-bq
Copy link
Contributor

rupal-bq commented Jul 1, 2024

@salyh some tests failed. Can you please take a look?

| 19 | Jack | finance | true |
+------------------+----------+------------+--------+

os> source=accounts | lookup hr employee_number AS account_number, dep AS department appendonly=true name AS given_name, active AS is_active ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please also add same example with appendonly=false ? I think that will help clarify how appendonly works. source=accounts | lookup hr employee_number AS account_number, dep AS department appendonly=false name AS given_name, active AS is_active ;

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for (Map resultMap : inputMap) {
Expression origin = expressionAnalyzer.analyze(resultMap.getOrigin(), lookupTableContext);
if (resultMap.getTarget() instanceof Field) {
Expression targerExpression =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo- targerExpression

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will fix this tomorrow together with a bugfix for this code segment we just discovered.

for (Map.Entry<String, Object> f : matchMap.entrySet()) {
BoolQueryBuilder orQueryBuilder = new BoolQueryBuilder();

// Todo: Search with term and a match query? Or terms only?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. Can you share an example query in which term and match both are required?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the lookup field in the lookup index is a "keyword" field (e.g. not analyzed) we need a term query. If the field is an analyzed field (full text field) we need a match query. Because we don't know the mapping of the field we just do both. Its way cheaper than first check the mapping.

public LogicalPlan visitLookup(Lookup node, AnalysisContext queryContext) {
LogicalPlan child = node.getChild().get(0).accept(this, queryContext);
List<Argument> options = node.getOptions();
// Todo, refactor the option.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a TODO?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just copied from

and not added by us. Seems there is a geneal need to refactor the options - but thats not specific to this PR

@salyh
Copy link

salyh commented Jul 3, 2024

@rupal-bq We have issues with the CI which are not caused by the code changes, for example:

 | Exception in thread "main" java.lang.IllegalStateException: opensearch-security requires Java 21:, your system: 17.0.11+9
| 	at org.opensearch.bootstrap.JarHell.checkJavaVersion(JarHell.java:278)
| 	at org.opensearch.plugins.PluginsService.verifyCompatibility(PluginsService.java:403)
| 	at org.opensearch.plugins.InstallPluginCommand.loadPluginInfo(InstallPluginCommand.java:819)
| 	at org.opensearch.plugins.InstallPluginCommand.installPlugin(InstallPluginCommand.java:874)
| 	at org.opensearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:276)
| 	at org.opensearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:250)
| 	at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104)
| 	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
| 	at org.opensearch.cli.MultiCommand.execute(MultiCommand.java:104)
| 	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
| 	at org.opensearch.cli.Command.main(Command.java:101)
| 	at org.opensearch.plugins.PluginCli.main(PluginCli.java:66)

https://github.com/opensearch-project/sql/actions/runs/9764080193/job/26953624605?pr=2698

@rupal-bq
Copy link
Contributor

rupal-bq commented Jul 9, 2024

@rupal-bq We have issues with the CI which are not caused by the code changes, for example:

 | Exception in thread "main" java.lang.IllegalStateException: opensearch-security requires Java 21:, your system: 17.0.11+9
| 	at org.opensearch.bootstrap.JarHell.checkJavaVersion(JarHell.java:278)
| 	at org.opensearch.plugins.PluginsService.verifyCompatibility(PluginsService.java:403)
| 	at org.opensearch.plugins.InstallPluginCommand.loadPluginInfo(InstallPluginCommand.java:819)
| 	at org.opensearch.plugins.InstallPluginCommand.installPlugin(InstallPluginCommand.java:874)
| 	at org.opensearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:276)
| 	at org.opensearch.plugins.InstallPluginCommand.execute(InstallPluginCommand.java:250)
| 	at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:104)
| 	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
| 	at org.opensearch.cli.MultiCommand.execute(MultiCommand.java:104)
| 	at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138)
| 	at org.opensearch.cli.Command.main(Command.java:101)
| 	at org.opensearch.plugins.PluginCli.main(PluginCli.java:66)

https://github.com/opensearch-project/sql/actions/runs/9764080193/job/26953624605?pr=2698

Issue is fixed in main now. Can you please rebase?

salyh and others added 9 commits July 16, 2024 13:11
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>

Fix typo

Signed-off-by: Hendrik Saly <[email protected]>
@@ -0,0 +1,153 @@
=============
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to enable doc_test, please add lookup.rst to catagory.json ppl_cli section,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* lookup-index: mandatory. the name of the lookup index. If more than one is provided, all of them must match.
* lookup-field: mandatory. the name of the lookup field. Must be existing in the lookup-index. It is used to match to a local field (in the current search) to get the lookup document. When there is no lookup document matching it is a no-op. If there is more than one an exception is thrown.
* local-lookup-field: optional. the name of a field in the current search to match against the lookup-field. **Default:** value of lookup-field.
* appendonly: optional. indicates if the values to copy over to the search result from the lookup document should overwrite existing values. If true no existing values are overwritten. **Default:** false.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why named it appendonly? does overwrite more straitforward?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe - I have no personal preference here

* lookup-field: mandatory. the name of the lookup field. Must be existing in the lookup-index. It is used to match to a local field (in the current search) to get the lookup document. When there is no lookup document matching it is a no-op. If there is more than one an exception is thrown.
* local-lookup-field: optional. the name of a field in the current search to match against the lookup-field. **Default:** value of lookup-field.
* appendonly: optional. indicates if the values to copy over to the search result from the lookup document should overwrite existing values. If true no existing values are overwritten. **Default:** false.
* source-field: optional. the fields to copy over from the lookup document to the search result. If no such fields are given all fields are copied. **Default:** all fields
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. could you add test on source-field?
  2. if appendonly is ommit, how does parser idenitfy differenate lookup-field and source-field.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. https://github.com/eliatra/sql/blob/f2410e8698b064dc6dfb90cbb054349ab4a0c719/integ-test/src/test/java/org/opensearch/sql/ppl/LookupCommandIT.java#L114
  2. whitespace (and keep in mind that lookup-field ist mandatory). Can you provide a concrete example where we probbaly run into ambiguity


BiFunction<String, Map<String, Object>, Map<String, Object>> lookup() {

if (client.getNodeClient() == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we can not use OpenSearchClient?

finalMap.put("_copy", copyMap.keySet());
}

Map<String, Object> lookupResult = lookup.apply(indexName, finalMap);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The complexity is O(N), N is rows in input dataset? should we rewrite query as LEFT JOIN?
Do we have performance test result?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No peformance tests yet.
Running the lookup command as a join has also perf related implications like triggering the circuit breaker when both indices contains lot of documents and fields. If we have perf related concerns maybe keeping the current logic and adding a simple LRU cache is an option? And how would one implement then "appendonly" flag with a join?

lukasz-soszynski-eliatra and others added 3 commits July 19, 2024 13:36
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
This was referenced Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants