Wednesday, March 12, 2014

Setting up Hive with HBase external storage on CDH

If you have existing HBase tables it can be very handy to create Hive external tables wrapping these so that you can run HiveQL queries.

The following HiveQL will create the metastore table schema on top of MyTableName:

CREATE EXTERNAL TABLE MyTableName(key string, Column1 string, Column2 string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:Column1,cf:Column2")
TBLPROPERTIES ('hbase.table.name' = 'MyTableName');



The next step is to add the hbase auxlib jars to hive-site.xml to ensure certain HiveQL queries will run (e.g. select count(1) from MyTableName):
<property>
<name>hive.aux.jars.path</name>
<value>file:///opt/cloudera/parcels/CDH/lib/hive/lib/zookeeper.jar,file:///opt/cloudera/parcels/CDH/lib/hive/lib/hbase.jar,file:///opt/cloudera/parcels/CDH/lib/hive/lib/hive-hbase-handler-0.10.0-cdh4.6.0.jar,file:///opt/cloudera/parcels/CDH/lib/hive/lib/guava-11.0.2.jar</value>
</property>


This property must be added to the hive1 service > Config > Service-Wide > Advanced > Hive Service Configuration Safety Valve for hive-site.xml section to be able to execute certain HiveQL commands from the host command line.

To execute HiveQL from the Hue HiveUI add it to the hue1 service > Config > Beeswax Server (Default) > Advanced > Hive Configuration Safety Valve section.


Refs:
https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration
http://www.confusedcoders.com/bigdata/hive/hbase-hive-integration-querying-hbase-via-hive
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Installation-Guide/cdh4ig_topic_18_10.html
https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/E8GfiwMOIPw

No comments:

Post a Comment