Friday, April 6, 2012

Installing Cloudera Hadoop 0.20 and Mahout on Ubuntu 11

Notes for installing Cloudera Hadoop 0.20 and Mahout on Ubuntu 11.10
April 2012

UPDATE SEPT 2014 - ok this is very old now.  A better way is to just use the Cloudera CDH installer!
# wget http://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
# chmod u+x cloudera-manager-installer.bin
# sudo ./cloudera-manager-installer.bin
Then follow the local installer steps before navigating to the web installer when prompted.

Original April 2012 Instructions:

0. Ubuntu
Fresh install of Ubuntu ubuntu-11.10-desktop-amd64.iso on VirualBox with all updates and Guest Addons
Also mount a shared folder created in VirtualBox:
$ sudo mount -t vboxsf _Shared ~/_Shared
or add this line (without sudo) to /etc/rc.local

1. Java and ssh prerequisites:
http://cloudblog.8kmiles.com/2011/12/02/hadoop-prerequisite-for-hadoop-setup-in-ubuntu/
http://softwareinabottle.wordpress.com/2011/11/17/install-sun-jdk-6-on-ubuntu-11-10/

2. Update Java env by adding the following to: /etc/bash.bashrc
http://blog.sanaulla.info/2009/04/02/installing-jdk-setting-java_home-in-ubuntu/
JAVA_HOME=/usr/lib/jvm/java-6-sun
export JAVA_HOME
PATH=$PATH:$JAVA_HOME/bin:
export PATH
CLASSPATH=$JAVA_HOME/lib/:.
export CLASSPATH

3. CDH3 install hadoop core:
http://cloudblog.8kmiles.com/2011/12/06/hadoop-cdh3-setup-in-ubuntu/
3.1 create cloudera.list:
$ sudo pico /etc/apt/sources.list.d/cloudera.list
with content:
deb http://archive.cloudera.com/debian maverick-cdh3 contrib
deb-src http://archive.cloudera.com/debian maverick-cdh3 contrib
$ curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add -
$ sudo apt-get update
$ sudo apt-get install hadoop-0.20
$ hadoop version

4. CDH3 install daemons for Pseudo Distributed Mode:
http://cloudblog.8kmiles.com/2011/12/07/hadoop-cdh3-pseudo-distributed-mode-setup/
to set up Standalone mode see: http://cloudblog.8kmiles.com/2011/12/07/hadoop-cdh3-standalone-mode-setup/

$ sudo apt-get install hadoop-0.20-namenode
$ sudo apt-get install hadoop-0.20-secondarynamenode
$ sudo apt-get install hadoop-0.20-jobtracker
$ sudo apt-get install hadoop-0.20-datanode
$ sudo apt-get install hadoop-0.20-tasktracker

$ sudo apt-get install hadoop-0.20-conf-pseudo
$ sudo alternatives --display hadoop-0.20-conf

5. Test the daemons:
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
sudo jps
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 2 100000

Navigate to: http://localhost:50070/ for Namenode details
Navigate to: http://localhost:50030/ for Job details

for service in /etc/init.d/hadoop-0.20-*; do sudo $service stop; done

6. Install mahout:
http://cloudblog.8kmiles.com/2012/02/16/cdh3-mahout-setup/
$ sudo apt-get install mahout


7. Install Hue interface:
https://ccp.cloudera.com/display/CDHDOC/Hue+Installation#HueInstallation-Installing%2CConfiguring%2CandStartingHueonOneMachine
$ sudo apt-get install hue

Open the /etc/hue/hue.ini configuration file and add:
[desktop]
secret_key=jFE93j;2[280-eiw.tyXrN2s3['d:/.q[eI^^y#e=+Iei*@Mn<qW5o

$ for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done
sudo /etc/init.d/hue start
Navigate to: http://localhost:8088/ for Hue

I hope these notes and links help someone out.  Please leave a comment if you have have any problems or additions.

No comments:

Post a Comment