Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dataframe from list instead of RDD #363

Draft
wants to merge 1 commit into
base: spark-3.5.1
Choose a base branch
from

Conversation

penghuo
Copy link
Collaborator

@penghuo penghuo commented May 30, 2024

Description

Create dataframe from list instead of RDD, since EMR-S does not support create dataframe from RDD.

Issues Resolved

n/a

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@penghuo penghuo marked this pull request as ready for review May 30, 2024 22:43
val resultSchema = spark.createDataFrame(
spark.sparkContext.parallelize(schemaRows),
schemaRows,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this process all data on driver node?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, only schema.

@penghuo
Copy link
Collaborator Author

penghuo commented May 31, 2024

To maintainer, Please don't merge this PR until we get confirmation that EMR-S upgrade is required.

@penghuo penghuo marked this pull request as draft June 21, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants