Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

facing error when parsing yaml using scala #47

Open
VIKCT001 opened this issue Jul 31, 2020 · 2 comments
Open

facing error when parsing yaml using scala #47

VIKCT001 opened this issue Jul 31, 2020 · 2 comments

Comments

@VIKCT001
Copy link

VIKCT001 commented Jul 31, 2020

I am facing below issue while running a scala code on dataproc cluster. Code is running fine at local.

[Exception in thread "main" java.lang.NoSuchMethodError: org.yaml.snakeyaml.Yaml.(Lorg/yaml/snakeyaml/LoaderOptions;)V]

`object mytestmain {

def main(args: Array[String]): Unit = {
println("In main function")
println("reading from gcs bucket")

//val storage = StorageOptions.getDefaultInstance.getService
//val my_blob = storage.get(BlobId.of("test-bucket", "job-configs/test.yml"))

// val filecontent = new String(my_blob.getContent(), StandardCharsets.UTF_8)

val config = """file_location: test-file
               |big_query_dataset: test-dataset
               |big_query_tablename: test-table
                 """.stripMargin

val classobj = new IngestionData()
classobj.printYamlfiledata(config)

}
}`

`package com.test.processing.jobs

import net.jcazevedo.moultingyaml._
import com.test.processing.conf.DatasetConfiguration

object ReadYamlConfiguration extends DefaultYamlProtocol {
implicit object datasetConfFormat extends YamlFormat[DatasetConfiguration] {

def write(obj: DatasetConfiguration)=YamlObject (
  YamlString("file_location") -> YamlString(obj.file_location),
  YamlString("big_query_dataset") -> YamlString(obj.big_query_dataset),
  YamlString("big_query_tablename") -> YamlString(obj.big_query_tablename)
)

println("I am in read datasetConfFormat object ")
def read(value: YamlValue) = {
  value.asYamlObject.getFields(
    YamlString("file_location"),
    YamlString("big_query_dataset"),
    YamlString("big_query_tablename")) match {
    case Seq(
    YamlString(file_location),
    YamlString(big_query_dataset),
    YamlString(big_query_tablename)) =>
    new DatasetConfiguration(file_location, big_query_dataset, big_query_tablename)
    case _ => deserializationError("Data configs expected")
  }
}
implicit val YamlDatasetConfigurationfFormat = yamlFormat3(DatasetConfiguration)

}
}`

`import net.jcazevedo.moultingyaml._

import com.test.processing.jobs.ReadYamlConfiguration._

class IngestionData {

def printYamlfiledata(filedata: String) = {

println("I am in readYamlfiledata method")

val myObj = filedata.parseYaml.convertTo[DatasetConfiguration]
println("file name is :" + myObj.file_location)
println("dataset name is:" +myObj.big_query_dataset)
println("Table name is:" + myObj.big_query_tablename)

}

}`

case class DatasetConfiguration ( file_location: String, big_query_dataset: String, big_query_tablename: String )

It's failing when I am reading yaml file from bucket or even when I have hardcoded the file as an Input. running fine at local

@dankolesnikov
Copy link

@VIKCT001 I am facing the exact same error, I have a project with Scala 2.11 and Spark 2.4, after implementing YAML parsing everything worked flawlessly via sbt test but after building a fat jar with sbt assembly and running spark-submit locally I am getting the method not found exception. After unpacking the jar, I can confirm that the org.yaml.snakeyaml.Yaml.(Lorg/yaml/snakeyaml/LoaderOptions;) is there.

My hunch is something bad is happening in assemblyMergeStrategy as it excludes pom.properties and and pom.xml of SnakeYAML because it is under META-INF folder which is expected behavior. @VIKCT001 could you share your build.sbt?

@jcazevedo Do you have any hunches? Have you encountered such a scenario?

@dankolesnikov
Copy link

dankolesnikov commented Oct 31, 2020

The reason for this issue to occur is because Apache Spark (2.4 in my case) uses SnakeYAML 1.15 which gets picked up first by the class loader when running the project with spark-submit and SnakeYAML 1.26 used by moultingyaml gets ignored.

Solution is to shade SnakeYAML in your build.sbt like that:

assemblyShadeRules in assembly := Seq(
  // fixes the problem when running from spark-submit an older version of SnakeYAML is being used
  ShadeRule.rename("org.yaml.snakeyaml.**" -> "org.yaml.snakeyamlShaded@1").inAll
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants