Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when saving a dataframe to Redshift (java.lang.ArrayStoreException: java.lang.invoke.SerializedLambda) #459

Open
marek-babic opened this issue Apr 7, 2021 · 1 comment

Comments

@marek-babic
Copy link

marek-babic commented Apr 7, 2021

Hi there

I'm using this package io.github.spark-redshift-community:spark-redshift_2.12:4.2.0 as a dependency in the context of AWS EMR job trying to save a dataframe to Redshift.

Sadly this attempt fails with following stacktrace:
https://gist.github.com/marek-babic/0110160bdd0ba11533b6f425559d2f1c

I know that the dataframe is in healthy state as show() and printSchema() output what I expect and the schema matches the one from Redshift table.

The code looks like so (where the capital letter vars are set appropriately):

df.write \
  .format("io.github.spark_redshift_community.spark.redshift") \
  .option("url", "jdbc:redshift://" + HOST_URL + ":5439/" + DATABASE_NAME) \
  .option("user", USERNAME) \
  .option("password", PASSWORD) \
  .option("dbtable", TABLE_NAME) \
  .option("aws_region", REGION) \
  .option("aws_iam_role", IAM_ROLE) \
  .option("tempdir", TMP_PATH) \
  .option("tempformat", "CSV") \
  .mode("overwrite") \
  .save()

I tried to save the dataframe to S3 just by running:

df.write.format("csv").save(TMP_PATH + "/test1")

which worked, so the permissions in AWS are correct.

Any ideas why this could be happening?
Thanks
Marek

@SaravShah
Copy link

Any solutions on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants