Running Cloudera in Standalone Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Standalone mode(single node cluster)


Deploy Cloudera (CDH3) in Standalone mode:
COMMAND DESCRIPTION
$ sudo add-apt-repository 
"deb http://archive.canonical.com/ lucid partner"
If you are using ubuntu 10.04 LTS run this command 
sudo apt-get install sun-java6-jdk Install java
lsb_release –c Name of the your distribution (let DISTRO)(eg: hardy or jaunty or lucid etc.)
vi /etc/apt/sources.list.d/cloudera.list
Then type:
deb http://archive.cloudera.com/debian DISTRO-cdh3 contrib
deb-src http://archive.cloudera.com/debian DISTRO-cdh3 contrib
A repository enables your package manager to install cloudera
replace DISTRO with the name of your distribution
sudo apt-get -y install curl install curl
curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add - Add a repository key. Add the Cloudera Public GPG Key to your repository
sudo apt-get update Update APT package index
apt-cache search hadoop List Hadoop packages on Debian systems
apt-get -y install hadoop-0.20 Install hadoop
dpkg -L hadoop-0.20 List the installed files
man hier See that the Hadoop package has been configured
Congratulations Cloudrea Setup is Completed. Now lets run some examples
hadoop jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar pi 10 100 Run pi example
cd /tmp
mkdir input
cp /etc/hadoop/conf/*.xml input
hadoop jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
cat output/*
Run grep example
cd /tmp
mkdir inputwords
cp /etc/hadoop/conf/*.xml inputwords
hadoop jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar wordcount inputwords outputwords
Run word count example

No comments:

Post a Comment