Tuesday, February 23, 2016

Debugging spark job incorrect JVM jar file loaded

I recently had an issue with incorrect version of a JVM jar file being loaded in a Spark job on Hadoop.

This Scala code snippet helped debug the issue and determine the loaded jar file path:

val jarPath = classOf[MyObject].getProtectionDomain().getCodeSource().getLocation().getPath()

In my case this pointed to an old version of guava:
/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/guava-11.0.jar

Then by setting spark.driver.extraClassPath and spark.executor.extraClassPath arguments for spark-submit, the correct version of the jar file was loaded successfully:

spark-submit --class com.MyClass <other_spark_args> --conf "spark.driver.extraClassPath=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/guava-15.0.jar" --conf "spark.executor.extraClassPath=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/jars/guava-15.0.jar" /home/mypath/myjarfile.jar <my_job_params>

For more info with extraClassPath, see: http://spark.apache.org/docs/latest/configuration.html