-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Hadoop Client #134
Comments
As part of a feature for this, it would be great to provide a signing mechanism or some sort of add on extension that signs requests using sigv4 so that it can be used on AWS. Additionally, the authentication mechanisms should consider other cloud providers so when installing and setting up the connector, much like you would do with Fluentd, you would pull in specific packages for this type of functionality so we can consider ease of adoption on other cloud provider signing / auth mechanisms. |
+1, I am looking for an equivalent of the |
Generally we've been forking these and making them work for OpenSearch. @CEHENKLE can speak to potential schedules, but don't hold back if you have cycles to do it, we'll gladly adopt the fork in the organization, or just contribute things like Sigv4 signing into someone else's fork. |
@dblock @CEHENKLE previously I was able to use the older versions of |
@mattweber There's some reverting going on (e.g. opensearch-project/OpenSearch#3484) around types to make some of it backward compatible again to prevent exactly this types of issues. Try against |
Will do, thanks for the info @dblock |
@mattweber Did you have any luck? I can answer on behalf of @CEHENKLE that we don't have anyone at the moment who can focus on Hadoop, but we would love assistance if you are offering @mattweber. |
@brijos no luck, the client assumes we have types when it sees at what it thinks is elastic version 2.0.0. I forked the client, removed that check, and manually built the spark driver and have been using that fork. I only use it for writing data to opensearch so the change was minimal. I am not sure what would be involved to support everything. Is there any docs on how to add a new client or a stub repository I can open a pull request against? I am not commiting to anything but if I get some time I would gladly look into adding it. |
The needs i have for this are two fold, 1) support basic auth / certs for open source impls of the engine and 2) support various additional auth protocols like sigv4 on AWS. If someone has done some initial work that addresses OpenSearch 1.x and can float backwards to a min of Elasticsearch version 7.1, that would get things started. I can hook the sigv4 signing if this is in java or python. Just need to know the entry points for whichever HTTP client lib is being used and hopefully that package supports an auth interface. |
Quick update, I'm going to start the process of creating a new repo for a Hadoop client with the intent of @mattweber contributing what he has. I'll keep this thread updated as we progress. |
@mattweber does your fork include support for Hive3 and Spark3 as well? |
@brijos I do not have a fork, I literally just took elastic's code and made a single line change. This was off a |
Spark 3 support didn't land until 7.12.0 so it would not be in any fork without needing to us to do additional work. |
@mattweber You can create your own public repo and add an issue in the repo proposing to be moved to the OpenSearch project. Tag @wbeckler in that issue. There might be some checkpoints that I need to clear, and then we'll move it over. Then we would want to make you a maintainer once the repo is in the OpenSearch project. |
@wbeckler FYI, we do have precedent for keeping a repo admin (vs. just a maintainer). The owner of https://github.com/opensearch-project/opensearch-plugin-template-java retained admin rights on the repo when it was moved into this org. |
This is now in progress: https://github.com/opensearch-project/opensearch-hadoop/pulls |
Will this client be made available via Maven ? We are using Open Search 1.2.4 in our organization and have been hunting for compatible Spark client. |
This is coming to Maven once the application passes a security review. These reviews usually take 1-2 months, but no date has been set yet. |
Thanks @harshavamsi and @wbeckler. Would like to know if we have any development on ETA for opensearch-hadoop in last two weeks? |
@prashantsc the client is ongoing security review inside of Amazon. We are targeting 4/30/23 currently, but it could change depending on outcomes. |
Thanks @harshavamsi for the information. |
@harshavamsi do you have updates on the security review ? We passed 4/30/23 so wondering if a new date is to be set. |
@junhl Thanks for checking in. Happy to report that the security review is complete and we're now working towards a release. There is some ongoing discussion over at opensearch-project/opensearch-build#3385 about the nuances of releasing this client. That is also where we will track the release. Stay tuned this week where most likely a release is going to happen. Thank for being patient! |
@harshavamsi Spark 3.x clusters can't connect to Opensearch using elasticsearch-spark libraries. Somehow the compatibility is broken and the connection fails with
However Spark 2.x clusters don't have this problem. Blocker for us because we only have Spark 3.x clusters :( Is this client coming anytime soon? |
Hi @chaitujil the OpenSearch build team is working on getting a release out as soon as possible. In the meanwhile could you let me know if you're using Managed OpenSearch on AWS or if you're hosting your own service. Also, can I know what versions of OpenSearch you're targetting? Thanks. |
@harshavamsi We are targeting OS 1.1 cluster. Managed OpenSearch. |
Hi folks, we published Snapshots here -- https://aws.oss.sonatype.org/content/repositories/snapshots/org/opensearch/client/. Do give it a try and let us know. We're going to publish an actual release early this week. Thanks! |
Thanks @harshavamsi for the update. |
@harshavamsi It looks like the opensearch-spark libs shared at the link above https://aws.oss.sonatype.org/content/repositories/snapshots/org/opensearch/client/ work with Amazon OpenSearch Service with OpenSearch engine v2.3. We want to connect to Amazon OpenSearch Service with Elasticsearch engine v7.10 in the AWS Glue Job (Type: Spark, Glue version - 3.0 or 4.0) . Can you please share the version of the libraries/jar files that we need to use for the same? |
Hi, we have a workaround that supports ES 7.x clusters. Have you given the connector a try? Does it fail? |
Is this release compatible with Open Search 1.2.x and Spark 3.2.x / Scala 2.12 ? |
@harshavamsi On referencing the jar file opensearch-spark-30_2.13-1.0.0-20230513.002822-1.jar in our AWS Glue Job (Type: Spark, Glue version - 3.0 or 4.0) we get the following error: Scala signature package has wrong version expected: 5.0 found: 5.2 in package.class Does this jar have to be compiled with java 8? Can you please help take a look at the issue? |
Hi @harshavamsi , is there a place wherein I can get all the configuration/options supported by Open search through spark? Using, OpenSearch 1.3.10 with Spark 3.3.1 with Scala 2.12. |
@akshayjain3450 please refer to this file -- https://github.com/opensearch-project/opensearch-hadoop/blob/main/mr/src/main/java/org/opensearch/hadoop/cfg/ConfigurationOptions.java. It has all the configuration options that you can set within the client. Thanks! |
Closing this issue as completed via #227 |
@harshavamsi I don't see workflow for scala 2.12 version. are not publishing any jar for Spark 3.x and Scala 2.12 compatable versions? |
@venkatbrr The artifact you're looking for is published as |
@harshavamsi @Xtansia Can you please update the compatability section for 1.0.1 version? we are trying to use Hadoop client for AWS Opensearch managed cluster of 2.5 version. So, want check the compatibility. |
Is your feature request related to a problem?
When using Elastic Search in the past, some in the community had been used to connecting using a Hadoop client. The ask is to create a Hadoop client which will connect to OpenSearch.
The text was updated successfully, but these errors were encountered: