-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PPL geoip function - old reference PR #781
Conversation
Signed-off-by: Hendrik Saly <[email protected]>
@YANG-DB @penghuo @lukasz-soszynski-eliatra @kt-eliatra @dr-lilienthal PPL geoip function syntax proposal Please review and comment if needed |
@salyh thanks
|
@LantaoJin @penghuo @dai-chen please add u'r comments here |
@YANG-DB @LantaoJin @penghuo @dai-chen datasource refers to the datasources provided by the ip2geo processor https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/ My idea was to leveage the already present capabilities of the ip2geo processor and extend it as necessary. Because it would avoid code duplication. When I read the docs of the processor correct, GeoLite2 is supported by now. I propose to add the above requested functionality to the processor. Then we try to use the code from the processor (as a library) to solve this issue. Otherwise we would reinvent the wheel. And there is also https://opensearch.org/docs/latest/data-prepper/pipelines/configuration/processors/geoip/ in data prepper which could be facilitated |
@salyh this sounds like a good idea ! |
- See https://opensearch.org/docs/latest/ingest-pipelines/processors/ip2geo/ | ||
|
||
|
||
### New syntax definition in ANTLR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user doc, could we remove the ANTLR related information, what here needs is just a user readable syntax definition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, the file is located ppl-lang/planning folder. I think we need an user doc instead of design doc. We can copy this design information to the PR description or issue description. And you can easily update your design without any code changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@YANG-DB I did unterstand that the ppl-lang/planning
folder is for design docs. When the design is approved it will be converted to a userdoc and moved out of the planning
sub-folder. Am I wrong with that assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it was for roadmap. I prefer to using GitHub PR description and comments. It's fine to add design docs in code if you prefer to. But these docs are not for users, we need to write a new one later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salyh yes - once design doc is approved we will move it with necessary updates to the regular commands folder
geoip function to add information about the geographical location of an IPv4 or IPv6 address | ||
|
||
1. **Proposed syntax** | ||
- `... | eval geoinfo = geoip([datasource,] ipAddress [,properties])` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does datasource
mean? Could you use other meaningful usage rather that abc
? abc
is not straightforward user case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datasource refers to the datasources provided by the ip2geo processor, see #781 (comment)
@@ -781,6 +782,10 @@ coalesceFunctionName | |||
: COALESCE | |||
; | |||
|
|||
geoipFunction | |||
: GEOIP LT_PRTHS (datasource = functionArg COMMA)? ipAddress = functionArg (COMMA properties = stringLiteral)? RT_PRTHS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datasource = functionArg
Any case for supporting functionArg
instead of stringLiteral
. Again, what does datasource
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no particular case for functionArg
. We can change it to stringLiteral
for sure
datasource refers to the datasources provided by the ip2geo processor, see #781 (comment)
HI @salyh, Currently looking into implementation if this feature. I saw above conversation on planning to leverage ip2geo functionality within geospatial plugin. Did you have a clear idea on how this can be done? From my understanding to leverage ip2geo, one must first create a ip2geo processor then attach it to a opensearch pipeline. I am not too sure how the Spark plugin could access this functionality. Some pointers would be much appreciated :) Thanks! |
The idea is not to leverage the processor as a processor but to reuse its code (either by copying it - not ideal - , or by separating them into a kind of shared commons library or by adding them as a dependency on code level). |
Sounds good my current idea is to create an spi for the plugin, as was done for job-scheduler (allowing the geospatial plugin to implement ExtensiblePlugin) am still working on a design. This is to allow the sql plugin to access the ip2geo functionality (am also working on the iplocation functionality there). Would you know if the Spark project would be able to leverage the SPI since it is not a plugin? If we are to expose the ip2geo functionality I would rather create something that works both for the sql-plugin and Spark. |
Hi @salyh, spent a bit more time investigating this ticket. currently plan is to rewrite the geo2ip functionality within the spark project. The reason for this is that currently the geo2ip processor with the geospatial plugin is intwined with using the job scheduler extension. From my understanding there is no way to leverage usage of this extension in spark so a lot of the functionality would not be able to be migrated. The rewritten geo2ip within spark will:
The current design question is if it would be better to use spark UDF or reflection to call java_method to call the rewritten geo2ip functionality? |
Will follow CIDR https://github.com/opensearch-project/opensearch-spark/pull/706/files approach and use UDF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a reference PR - DO NO MERGE !
Description
PPL geoip function
Issues Resolved
#672
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.