-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create sample command fails in Google Dataproc Spark 2.11.8 #163
Comments
I even tried with the same configuration given in the documentation with Dataproc 1.0 image and verdict-core-0.3.0-jar-with-dependencies.jar. When I do the create sample command I am getting this error |
I tried building the master branch ( verdict-spark-lib-0.4.11.jar ) and ran it on a fresh instance of google dataproc 1.2 version. Even in that instance when I run import edu.umich.verdict.VerdictSpark2Context I am getting the following error What does this error mean? |
This seems to be the HDFS (or Hive) permission issue. When I observed
similar errors, it was due to the lack of write permission in the
spark-warehouse directory.
Can you see if the regular SparkSession.sql("create schema myschema")
works? If you are using the Spark interactive shell, the command should be
spark.sql("create schema myschema"). Otherwise, the variable "spark" should
be replaced with the instance of SparkSession.
Depending on the result of the above command, our investigation will take a
different direction.
Thanks,
Yongjoo
…On Tue, Jul 17, 2018 at 3:55 AM Sanjay Kumar ***@***.***> wrote:
I tried building the master branch and ran it on a fresh instance of
google dataproc 1.2 version. Even in that instance when I run
import edu.umich.verdict.VerdictSpark2Context
scala> val vc = new VerdictSpark2Context(sc)
scala> vc.sql("show databases").show(false)
scala> vc.sql("create sample of database_name.table_name").show(false)
I am getting the following error
org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:Unable to create database path
file:/home/sanjay/spark-warehouse/default_verdict.db, failed to create
database default_verdict); at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
at
org.apache.spark.sql.hive.HiveExternalCatalog.doCreateDatabase(HiveExternalCatalog.scala:163)
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalog.createDatabase(ExternalCatalog.scala:69)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:219)
at
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:66)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:183) at
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:68) at
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:632) at
edu.umich.verdict.dbms.DbmsSpark2.execute(DbmsSpark2.java:84) at
edu.umich.verdict.dbms.DbmsSpark2.executeUpdate(DbmsSpark2.java:91) at
edu.umich.verdict.dbms.Dbms.createCatalog(Dbms.java:192) at
edu.umich.verdict.dbms.Dbms.createDatabase(Dbms.java:183) at
edu.umich.verdict.query.CreateSampleQuery.buildSamples(CreateSampleQuery.java:93)
at
edu.umich.verdict.query.CreateSampleQuery.compute(CreateSampleQuery.java:64)
at edu.umich.verdict.query.Query.computeDataset(Query.java:192) at
edu.umich.verdict.VerdictSpark2Context.execute(VerdictSpark2Context.java:61)
at
edu.umich.verdict.VerdictContext.executeSpark2Query(VerdictContext.java:160)
at edu.umich.verdict.VerdictSpark2Context.sql(VerdictSpark2Context.java:81)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#163 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQYCvrlAesx3-l_DAhu7C6-gwSsHLPUks5uHZhfgaJpZM4VSHLP>
.
--
Yongjoo Park, Ph.D.
Research Fellow
Computer Science and Engineering
University of Michigan
2260 Hayward St.
Ann Arbor, MI 48109-2121
Office: 4957 Beyster
Phone: (734) 707-9206
Website: yongjoopark.com
|
@pyongjoo I think you are right. I am not able to create the schema as well. I am getting the same error when I try that. How can I resolve this issue? |
In my case, I used the regular hdfs command. An example is "hdfs
dfs chmod 777 /.../spark-warehouse". This command is usually possible when
you have a separate installation of HDFS and Spark is using the HDFS
installation. I attach a link for more hdfs commands:
https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
I am pretty sure there are lots of other documentation for Google dataproc,
but I cannot test it right now.
FYI, we plan to update VerdictDB soon. In that version, you will be able to
configure the schema used by Verdict directly.
…On Tue, Jul 17, 2018 at 8:22 PM Sanjay Kumar ***@***.***> wrote:
@pyongjoo <https://github.com/pyongjoo> I think you are right. I am not
able to create the schema as well. I am getting the same error when I try
that. How can I resolve this issue?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#163 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQYCvBsmvyc5zL3A1eNs7gW24ktU6r5ks5uHn-ogaJpZM4VSHLP>
.
--
Yongjoo Park, Ph.D.
Research Fellow
Computer Science and Engineering
University of Michigan
2260 Hayward St.
Ann Arbor, MI 48109-2121
Office: 4957 Beyster
Phone: (734) 707-9206
Website: yongjoopark.com
|
I am getting the error
java.io.IOException: Mkdirs failed to create file:/home/sanjay/spark-warehouse/default_verdict.db/vt23_1/.hive-staging_hive_2018-07-17_03-03-28_842_6156432897141230125-1/-ext-10000/_temporary/0/_temporary/attempt_20180717030333_0002_m_000016_3
when I run the commandvc.sql("create sample of default.advertiser_apr_orc").show(false)
.I am running on Dataproc image 1.2, with spark 2.11.8 and verdict-spark-lib-0.4.8.jar. I am running this command as the root user and have done chmod 755 to the dir /home/sanjay/
The text was updated successfully, but these errors were encountered: