Add a script for importing GeoIP data #954

normanj-bitquill · 2024-11-27T21:36:18Z

Description

Splits subnets as necessary to remove any overlap
Is a Scala script to run in Spark CLI
Can optionally import the data into a Spark table
Adds 3 extra columns that will be used to find the subnet for an IP address

Related Issues

N/A

Check List

Updated documentation (docs/ppl-lang/README.md)
Implemented unit tests
Implemented tests for combination with other commands
New added source code should include a copyright header
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

* Splits subnets as necessary to remove any overlap * Is a Scala script to run in Spark CLI * Can optionally import the data into a Spark table * Adds 3 extra columns that will be used to find the subnet for an IP address Signed-off-by: Norman Jordan <[email protected]>

YANG-DB

@normanj-bitquill
Nice - can you please add a csv file as an example usage?
And document the end-to-end process to allow queries based on this table

kenrickyap · 2024-11-28T18:29:54Z

docs/opensearch-geoip.md

+to use these functions, a table needs to be created containing the geographic location
+information.
+
+## How to Create Geographic Location Index


Might be good to provide example for downloading csv from https://geoip.maps.opensearch.org/v1/geolite2-city/manifest.json.

@normanj-bitquill @kenrickyap yes lets please add a complete End2End section in the doc

Added an End-to-End section.

kenrickyap · 2024-11-28T18:31:39Z

docs/opensearch-geoip.md

+| Column Name | Description                                                        |
+|-------------|--------------------------------------------------------------------|
+| start       | An integer value used to determine if an IP address is in a subnet |
+| end         | An integer value used to determine if an IP address is in a subnet |


would it be better to name columns ip_range_start and ip_range_end for clarity on what column represents?

kenrickyap · 2024-11-28T18:34:38Z

docs/load_geoip_data.scala

@@ -0,0 +1,402 @@
+import java.io.BufferedReader


Suggested change

import java.io.BufferedReader

/*

* Copyright OpenSearch Contributors

* SPDX-License-Identifier: Apache-2.0

*/

import java.io.BufferedReader

nit: probably need to add Apache header:

Added the Apache header.

kenrickyap · 2024-11-28T19:26:55Z

docs/load_geoip_data.scala

+    println("Done")
+}
+
+var INPUT_FILE: String = "/Users/normanj/Documents/geoip/geolite2-City.csv"


Might be better to have replace local path to <file_path_to_input_csv> and <file_path_to_output_csv>

Changed the variable names.

kenrickyap

some comments. Also were you able to test the ipv6 and address overlap logic?

kenrickyap · 2024-11-28T20:16:13Z

docs/load_geoip_data.scala

+    ipv4Root.fixTree()
+    ipv6Root.fixTree()
+
+    println("Writing data to file")


Might be better user experience to add loading bar

@kenrickyap love this idea!!

When writing out the file, it will print the percentage every 10%. This is done for IPv4 and then again for IPv6.

YANG-DB · 2024-11-28T20:17:00Z

docs/opensearch-geoip.md

+
+1. Create a copy of the scala file `load_geoip_data.scala`
+2. Edit the file
+3. There are three variables that need to be updated.


@normanj-bitquill
is this the Edit the file part from line 2 ?
if so it should be 2.1 / 2.2
And can we also give these input as parameters ?

The :load command in Spark only accepts one argument (the filename of the script to load).

I have fixed the formatting here.

Signed-off-by: Norman Jordan <[email protected]>

normanj-bitquill · 2024-12-02T21:18:50Z

some comments. Also were you able to test the ipv6 and address overlap logic?

@kenrickyap Yes, I did test with overlaps. I added entries for both IPv4 and IPv6 to verify that it would split up the subnet properly so that there is no overlap.

YANG-DB · 2024-12-02T22:06:06Z

@normanj-bitquill LGTM 👍
I'll merge it later today

* Add a script for importing GeoIP data * Splits subnets as necessary to remove any overlap * Is a Scala script to run in Spark CLI * Can optionally import the data into a Spark table * Adds 3 extra columns that will be used to find the subnet for an IP address Signed-off-by: Norman Jordan <[email protected]> * Updated based on PR feedback Signed-off-by: Norman Jordan <[email protected]> * Fixed formatting Signed-off-by: Norman Jordan <[email protected]> --------- Signed-off-by: Norman Jordan <[email protected]>

normanj-bitquill requested review from dai-chen, mengweieric, vmmusings, penghuo, seankao-az, anirudha, kaituo, YANG-DB, noCharger, LantaoJin and ykmr1224 as code owners November 27, 2024 21:36

YANG-DB reviewed Nov 27, 2024

View reviewed changes

kenrickyap reviewed Nov 28, 2024

View reviewed changes

YANG-DB reviewed Nov 28, 2024

View reviewed changes

YANG-DB added Lang:PPL Pipe Processing Language support 0.7 labels Nov 30, 2024

normanj-bitquill added 2 commits December 2, 2024 13:11

Updated based on PR feedback

0b2e230

Signed-off-by: Norman Jordan <[email protected]>

Fixed formatting

ede075c

Signed-off-by: Norman Jordan <[email protected]>

YANG-DB approved these changes Dec 2, 2024

View reviewed changes

YANG-DB merged commit d35d66b into opensearch-project:main Dec 3, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a script for importing GeoIP data #954

Add a script for importing GeoIP data #954

normanj-bitquill commented Nov 27, 2024

YANG-DB left a comment

kenrickyap Nov 28, 2024

YANG-DB Nov 28, 2024

normanj-bitquill Dec 2, 2024

kenrickyap Nov 28, 2024

normanj-bitquill Dec 2, 2024

kenrickyap Nov 28, 2024 •

edited

Loading

normanj-bitquill Dec 2, 2024

kenrickyap Nov 28, 2024

normanj-bitquill Dec 2, 2024

kenrickyap left a comment

kenrickyap Nov 28, 2024

YANG-DB Nov 28, 2024

normanj-bitquill Dec 2, 2024

YANG-DB Nov 28, 2024 •

edited

Loading

normanj-bitquill Dec 2, 2024

normanj-bitquill commented Dec 2, 2024

YANG-DB commented Dec 2, 2024 •

edited

Loading

Add a script for importing GeoIP data #954

Add a script for importing GeoIP data #954

Conversation

normanj-bitquill commented Nov 27, 2024

Description

Related Issues

Check List

YANG-DB left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kenrickyap Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kenrickyap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YANG-DB Nov 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

normanj-bitquill commented Dec 2, 2024

YANG-DB commented Dec 2, 2024 • edited Loading

kenrickyap Nov 28, 2024 •

edited

Loading

YANG-DB Nov 28, 2024 •

edited

Loading

YANG-DB commented Dec 2, 2024 •

edited

Loading