java - spark-submit dependency resolution for spark-csv -
i doing small scala program converts csv parquet. using databricks spark-csv. here's build.sbt
name: = "tst" version: = "1.0" scalaversion: = "2.10.5" librarydependencies++ = seq( "org.apache.spark" % % "spark-core" % "1.6.1" % "provided", "org.apache.spark" % % "spark-sql" % "1.6.1", "com.databricks" % "spark-csv_2.10" % "1.5.0", "org.apache.spark" % % "spark-hive" % "1.6.1", "org.apache.commons" % "commons-csv" % "1.1", "com.univocity" % "univocity-parsers" % "1.5.1", "org.slf4j" % "slf4j-api" % "1.7.5" % "provided", "org.scalatest" % % "scalatest" % "2.2.1" % "test", "com.novocode" % "junit-interface" % "0.9" % "test", "com.typesafe.akka" % "akka-actor_2.10" % "2.3.11", "org.scalatest" % % "scalatest" % "2.2.1", "com.holdenkarau" % % "spark-testing-base" % "1.6.1_0.3.3", "com.databricks" % "spark-csv_2.10" % "1.5.0", "org.joda" % "joda-convert" % "1.8.1" )
after sbt package
, when run command
spark-submit --master local[*] target/scala-2.10/tst_2.10-1.0.jar
i following error.
exception in thread "main" java.lang.classnotfoundexception: failed find data source: com.databricks.spark.csv. please find packages @ http://spark-packages.org
i can see com.databricks_spark-csv_2.10-1.5.0.jar
file in ~/.ivy2/jars/
downloaded sbt package
command
the source code of dataconversion.scala
import org.apache.spark.sql.sqlcontext import org.apache.spark.sparkcontext import org.apache.spark.sparkcontext._ import org.apache.spark.sparkconf object dataconversion { def main(args: array[string]) { val conf = new sparkconf() .setappname("clusterscore") .set("spark.storage.memoryfraction", "1") val sc = new sparkcontext(conf) val sqlc = new sqlcontext(sc) val df = sqlc.read .format("com.databricks.spark.csv") .option("header", "true") // use first line of files header .option("inferschema", "true") // automatically infer data types .load("/tmp/cars.csv") println(df.printschema) } }
i can spark-submit
without error if specify --jars
option explicit jar
path. that's not ideal. please suggest.
use sbt-assembly plugin build "fat jar" containing all dependencies sbt assembly
, , call spark-submit
on that.
in general, when classnotfoundexception
, try exploding jar created see what's in jar tvf target/scala-2.10/tst_2.10-1.0.jar
. checking what's in ivy cache meaningless; tells sbt found it. mathematicians say, that's necessary not sufficient.
Comments
Post a Comment