How to access HBase from spark-shell using YARN as the master on CDH 5.3 and Spark 1.2
From terminal:# export SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar:/opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar:/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar:/etc/hbase/conf/hbase-site.xml
# spark-shell --master yarn-client
Now you can access HBase from the Spark shell prompt:
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HBaseConfiguration, HTableDescriptor}
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
val tableName = "My_HBase_Table_Name"
val hconf = HBaseConfiguration.create()
hconf.set(TableInputFormat.INPUT_TABLE, tableName)
val admin = new HBaseAdmin(hconf)
if (!admin.isTableAvailable(tableName)) {
val tableDesc = new HTableDescriptor(tableName)
admin.createTable(tableDesc)
}
val hBaseRDD = sc.newAPIHadoopRDD(hconf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result])
val result = hBaseRDD.count()
Thanks to these refs for pointers:
http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/44744
http://apache-spark-user-list.1001560.n3.nabble.com/HBase-and-non-existent-TableInputFormat-td14370.html
Hi Dylan,
ReplyDeletethanks for the guide, really helpful. Just a note for future developers. Since CDH 5.4.0 it is necessary to import the incubating version of htrace-core version (at least 3.1.0) instead of the symlinked 3.0.4 because HTrace moved to Apache (http://htrace.incubator.apache.org/) and so class org.htrace.Trace is now org.apache.htrace.Trace (NoClassDefFoundError otherwise). So instead of /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core.jar we need to take /opt/cloudera/parcels/CDH/lib/hbase/lib/htrace-core-3.1.0-incubating.jar. Probably in future version of CDH they will fix it and everything will work with the "standard" symlinked jar, meanwhile..
Bye,
Michele
Thanks for the update Michele.
DeleteHi Dylan
ReplyDeleteI use spark-shell --jars options and it still fails.
According to the your suggestion, use export SPARK_CLASSPATH works successfully.
Do you know the difference between spark-shell jars options and SPARK_CLASSPATH ?
Thanks you!
This comment has been removed by a blog administrator.
ReplyDeleteThanks for the post. I used it as skeleton to do the same on Hortonworks.
ReplyDeleteReally nice blog post.provided a helpful information.I hope that you will post more updates like this Big data hadoop online Course India
ReplyDelete