Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write spark dataframe to opensearch #3944

Closed
wl02302677 opened this issue Jul 19, 2022 · 6 comments
Closed

Write spark dataframe to opensearch #3944

wl02302677 opened this issue Jul 19, 2022 · 6 comments
Labels
question Questions about how things work, requests for help

Comments

@wl02302677
Copy link

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

I am evaluating that converting es to opensearch, and my team use spark to process data.
I want to know if I can directly write spark dataframe to opensearch, just like we did on elasticsearch?
I just started following this project but didn't see a similar case on google or stack overflow, thanks for all your help!

Describe the solution you'd like
A clear and concise description of what you want to happen.
write spark dataframe to opensearch and use spark resource.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@wl02302677 wl02302677 added enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 19, 2022
@andrross
Copy link
Member

Generally speaking the OpenSearch indexing APIs are very similar to recent versions of ES. How do you plan on writing from Spark to OpenSearch?

@andrross andrross added question Questions about how things work, requests for help and removed enhancement Enhancement or improvement to existing feature or request untriaged labels Jul 19, 2022
@wl02302677
Copy link
Author

Hi andrross, thanks for your assistance! we are using databricks and spark to join and aggregate some table.
And before insert ES, we would repartitaion dataframe to match our ES's shards. Spark has full support on index ES document by dataframe. I am not sure if insert opensearch by spark and it's dataframe is feasible?

@andrross
Copy link
Member

Spark has full support on index ES document by dataframe

What exactly does this mean? If Spark has some tooling or connector that interfaces with ES then an OpenSearch equivalent would need to be implemented in order to guarantee compatibility long term.

@wl02302677
Copy link
Author

This document is what I actuall means in ES:
https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html

And I found the discussion about that function is open in issue 23:
opensearch-project/opensearch-hadoop#134

@mattweber
Copy link
Contributor

@wl02302677 I was just going to mention the client issue. That has link to single line code change required to get elastic client working with open search 2x. I’m using it just fine to do exactly what you are asking.

@wl02302677
Copy link
Author

@mattweber Thanks for your respond! I will try to use this method to get the connection between spark and elasticsearch, I think that is the solution what I looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Questions about how things work, requests for help
Projects
None yet
Development

No branches or pull requests

3 participants