Cloudera Distribution for Hadoop (CDH): Running Cloudera in Pseudo Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Pseudo distributed mode(single node cluster)

Deploy Cloudera (CDH3) in Pseudo Distributed mode:

COMMAND	DESCRIPTION
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"	If you are using ubuntu 10.04 LTS run this command
sudo apt-get install sun-java6-jdk	Install java
lsb_release –c	Name of the your distribution (let DISTRO)(eg: hardy or jaunty etc.)
vi /etc/apt/sources.list.d/cloudera.list Then type: deb http://archive.cloudera.com/debian DISTRO-cdh3 contrib deb-src http://archive.cloudera.com/debian DISTRO-cdh3 contrib	A repository enables your package manager to install cloudera replace DISTRO with the name of your distribution
sudo apt-get -y install curl	install curl
curl -s http://archive.cloudera.com/debian/archive.key \| sudo apt-key add -	Add a repository key. Add the Cloudera Public GPG Key to your repository
sudo apt-get update	Update APT package index
sudo apt-get -y install hadoop-0.20-conf-pseudo	Install Hadoop in pseudo-distributed mode: A pseudo-distributed Hadoop installation is composed of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done	Start the Cloudera Daemons
dpkg -L hadoop-0.20-conf-pseudo	Viewing the files on Debian systems
jps	It should give output like this: 14799 NameNode 14977 SecondaryNameNode 15183 DataNode 15596 JobTracker 15897 TaskTracker
Congratulations Cloudrea Setup is Completed. Now lets run some examples
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 10 100	run pi example
hadoop fs -mkdir input hadoop fs -put /etc/hadoop-0.20/conf/.xml input hadoop-0.20 fs -ls input hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop--examples.jar grep input output 'dfs[a-z.]+'	run grep example
hadoop-0.20 fs -mkdir inputwords hadoop-0.20 fs -put /etc/hadoop-0.20/conf/.xml inputwords hadoop-0.20 fs -ls inputwords hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop--examples.jar grep inputwords outputwords 'dfs[a-z.]+'	run word count example
http://localhost:50070/	web based interface for name node
http://localhost:50030/	web based interface for Job tracker
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done	Shutdown CDH2 Hadoop services

22 comments:

UnknownDecember 5, 2010 at 5:50 PM
Hi, I'm very new to linux and would need some additional help to install hadoop. When I enter the command: "vi /etc/apt/sources.list.d", my screen fills up with "~". Can you please help?
Thanks!
Ghislain
RahulDecember 7, 2010 at 10:23 PM
hi Ghis you are trying to open a directory in vi try this command:
vi /etc/apt/sources.list.d/cloudera.list
it will open a file named cloudera.list,
then you go in insert mode and type as specified in above tutorial.
RahulJanuary 12, 2011 at 11:26 AM
Hi all,
please run this command before starting on ubuntu 10.04 LTS:
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"

$ sudo apt-get update
VishweshJanuary 16, 2011 at 11:46 PM
Thanks a lot.

I was facing problem on Ubuntu 'Natty'. Now I tried this for Lucid and it worked perfectly for me. :)

Thanks,
Vishwesh
VishweshJanuary 17, 2011 at 1:23 AM
Rahul, On my system 'jps' command is not showing output you have shown above. It is showing output as given below:

$jps
28217 Jps

But important thing is Everything is working fine. I can run Example n all.

Any thoughts?

Thanks,
Vishwesh
VishweshJanuary 17, 2011 at 1:39 AM
Ohhh.. above is 'sudo' issue.

$sudo jps
28281 Jps
4286 NameNode
4038 DataNode
4417 SecondaryNameNode
3757 RunJar
4593 TaskTracker

is the proper output as expected.

Thanks,
Vishwesh
RahulJanuary 17, 2011 at 7:16 AM
Hi Vishwesh,
sorry for late reply,
your jps command is not showing JobTracker,
please check the your logs
Sanjay AcharyaJanuary 20, 2011 at 7:03 PM
Nice post Rahul, helped me quick start really easily. Thanks.
RahulApril 9, 2011 at 11:59 AM
Thanx Sanjay
RatanJune 30, 2011 at 2:56 AM
Nice one. Thanks
vaddiApril 10, 2012 at 9:30 PM
hi rahul im getting error wen i am installing java in ubuntu 10.04
sudo apt-get install sun-java6-jdk
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package sun-java6-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package sun-java6-jdk has no installation candidate
vaddiApril 10, 2012 at 11:00 PM
hi rahul i wnt to install cloudera manager on cent os
pl tell me whether we also need to install cent os on slave also ?
otherwise we can use any o.s system as a slave

thanks in advance
vaddiApril 11, 2012 at 12:29 AM
thanks rahul
but my team lead told to do using cloud manager
is any problem

which is the best & easy way

we must & should install cloud manager into all slaves also otherwise
its jst enough to install in master nly

ur suggestions is helpming me alot
Thanks once again
vaddiApril 11, 2012 at 12:34 AM
please send me how to use clusters in clouder manager for cent os & also send to my mail how to retrieve data from hbase using clusters in cloudera
vaddiApril 12, 2012 at 12:41 AM
Hi

OS---------------- Cent OS

which is the best database for hadoop
either hbase or mongodb

which will work using cloudera manager
AnonymousApril 16, 2012 at 2:23 AM
Hi
why hbase takes timestamp by default? how can we display only values
name sai like this