-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lookup #2698
base: main
Are you sure you want to change the base?
Lookup #2698
Conversation
Is this ready for review now? Can you please rebase with latest main to fix CI? |
@rupal-bq I will rebase. |
Can you please add reference manual for lookup command? (e.g. https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/cmd/dedup.rst) |
e7c8535
to
d4f88c3
Compare
I rebased and squashed the commits and fixed DCO |
done |
@salyh some tests failed. Can you please take a look? |
docs/user/ppl/cmd/lookup.rst
Outdated
| 19 | Jack | finance | true | | ||
+------------------+----------+------------+--------+ | ||
|
||
os> source=accounts | lookup hr employee_number AS account_number, dep AS department appendonly=true name AS given_name, active AS is_active ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please also add same example with appendonly=false ? I think that will help clarify how appendonly works. source=accounts | lookup hr employee_number AS account_number, dep AS department appendonly=false name AS given_name, active AS is_active ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for (Map resultMap : inputMap) { | ||
Expression origin = expressionAnalyzer.analyze(resultMap.getOrigin(), lookupTableContext); | ||
if (resultMap.getTarget() instanceof Field) { | ||
Expression targerExpression = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo- targerExpression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will fix this tomorrow together with a bugfix for this code segment we just discovered.
for (Map.Entry<String, Object> f : matchMap.entrySet()) { | ||
BoolQueryBuilder orQueryBuilder = new BoolQueryBuilder(); | ||
|
||
// Todo: Search with term and a match query? Or terms only? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this. Can you share an example query in which term and match both are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the lookup field in the lookup index is a "keyword" field (e.g. not analyzed) we need a term query. If the field is an analyzed field (full text field) we need a match query. Because we don't know the mapping of the field we just do both. Its way cheaper than first check the mapping.
public LogicalPlan visitLookup(Lookup node, AnalysisContext queryContext) { | ||
LogicalPlan child = node.getChild().get(0).accept(this, queryContext); | ||
List<Argument> options = node.getOptions(); | ||
// Todo, refactor the option. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just copied from
// Todo, refactor the option. |
@rupal-bq We have issues with the CI which are not caused by the code changes, for example:
https://github.com/opensearch-project/sql/actions/runs/9764080193/job/26953624605?pr=2698 |
Issue is fixed in main now. Can you please rebase? |
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]> Fix typo Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
@@ -0,0 +1,153 @@ | |||
============= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to enable doc_test, please add lookup.rst to catagory.json ppl_cli section,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
* lookup-index: mandatory. the name of the lookup index. If more than one is provided, all of them must match. | ||
* lookup-field: mandatory. the name of the lookup field. Must be existing in the lookup-index. It is used to match to a local field (in the current search) to get the lookup document. When there is no lookup document matching it is a no-op. If there is more than one an exception is thrown. | ||
* local-lookup-field: optional. the name of a field in the current search to match against the lookup-field. **Default:** value of lookup-field. | ||
* appendonly: optional. indicates if the values to copy over to the search result from the lookup document should overwrite existing values. If true no existing values are overwritten. **Default:** false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why named it appendonly? does overwrite more straitforward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe - I have no personal preference here
* lookup-field: mandatory. the name of the lookup field. Must be existing in the lookup-index. It is used to match to a local field (in the current search) to get the lookup document. When there is no lookup document matching it is a no-op. If there is more than one an exception is thrown. | ||
* local-lookup-field: optional. the name of a field in the current search to match against the lookup-field. **Default:** value of lookup-field. | ||
* appendonly: optional. indicates if the values to copy over to the search result from the lookup document should overwrite existing values. If true no existing values are overwritten. **Default:** false. | ||
* source-field: optional. the fields to copy over from the lookup document to the search result. If no such fields are given all fields are copied. **Default:** all fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- could you add test on source-field?
- if appendonly is ommit, how does parser idenitfy differenate lookup-field and source-field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- https://github.com/eliatra/sql/blob/f2410e8698b064dc6dfb90cbb054349ab4a0c719/integ-test/src/test/java/org/opensearch/sql/ppl/LookupCommandIT.java#L114
- whitespace (and keep in mind that lookup-field ist mandatory). Can you provide a concrete example where we probbaly run into ambiguity
|
||
BiFunction<String, Map<String, Object>, Map<String, Object>> lookup() { | ||
|
||
if (client.getNodeClient() == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we can not use OpenSearchClient?
finalMap.put("_copy", copyMap.keySet()); | ||
} | ||
|
||
Map<String, Object> lookupResult = lookup.apply(indexName, finalMap); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complexity is O(N), N is rows in input dataset? should we rewrite query as LEFT JOIN?
Do we have performance test result?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No peformance tests yet.
Running the lookup command as a join has also perf related implications like triggering the circuit breaker when both indices contains lot of documents and fields. If we have perf related concerns maybe keeping the current logic and adding a simple LRU cache is an option? And how would one implement then "appendonly" flag with a join?
…hIndex Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Lukasz Soszynski <[email protected]>
Signed-off-by: Hendrik Saly <[email protected]>
Description
[Describe what this change achieves]
Implementation of the new lookup command.
Issues Resolved
[List any issues this PR will resolve]
#2651
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.