Deploy Cloudera (CDH3) in Pseudo Distributed mode:
COMMAND | DESCRIPTION |
---|---|
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner" | If you are using ubuntu 10.04 LTS run this command |
sudo apt-get install sun-java6-jdk | Install java |
lsb_release –c | Name of the your distribution (let DISTRO)(eg: hardy or jaunty etc.) |
vi /etc/apt/sources.list.d/cloudera.list Then type: deb http://archive.cloudera.com/debian DISTRO-cdh3 contrib deb-src http://archive.cloudera.com/debian DISTRO-cdh3 contrib | A repository enables your package manager to install cloudera replace DISTRO with the name of your distribution |
sudo apt-get -y install curl | install curl |
curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add - | Add a repository key. Add the Cloudera Public GPG Key to your repository |
sudo apt-get update | Update APT package index |
sudo apt-get -y install hadoop-0.20-conf-pseudo | Install Hadoop in pseudo-distributed mode: A pseudo-distributed Hadoop installation is composed of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker |
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done | Start the Cloudera Daemons |
dpkg -L hadoop-0.20-conf-pseudo | Viewing the files on Debian systems |
jps | It should give output like this: 14799 NameNode 14977 SecondaryNameNode 15183 DataNode 15596 JobTracker 15897 TaskTracker |
Congratulations Cloudrea Setup is Completed. Now lets run some examples | |
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 10 100 | run pi example |
hadoop fs -mkdir input hadoop fs -put /etc/hadoop-0.20/conf/*.xml input hadoop-0.20 fs -ls input hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep input output 'dfs[a-z.]+' | run grep example |
hadoop-0.20 fs -mkdir inputwords hadoop-0.20 fs -put /etc/hadoop-0.20/conf/*.xml inputwords hadoop-0.20 fs -ls inputwords hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep inputwords outputwords 'dfs[a-z.]+' | run word count example |
http://localhost:50070/ | web based interface for name node |
http://localhost:50030/ | web based interface for Job tracker |
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done | Shutdown CDH2 Hadoop services |
Hi, I'm very new to linux and would need some additional help to install hadoop. When I enter the command: "vi /etc/apt/sources.list.d", my screen fills up with "~". Can you please help?
ReplyDeleteThanks!
Ghislain
Ghis, try learning some Linux.
DeleteHello?
hi Ghis you are trying to open a directory in vi try this command:
ReplyDeletevi /etc/apt/sources.list.d/cloudera.list
it will open a file named cloudera.list,
then you go in insert mode and type as specified in above tutorial.
Hi all,
ReplyDeleteplease run this command before starting on ubuntu 10.04 LTS:
$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
$ sudo apt-get update
Thanks a lot.
ReplyDeleteI was facing problem on Ubuntu 'Natty'. Now I tried this for Lucid and it worked perfectly for me. :)
Thanks,
Vishwesh
Rahul, On my system 'jps' command is not showing output you have shown above. It is showing output as given below:
ReplyDelete$jps
28217 Jps
But important thing is Everything is working fine. I can run Example n all.
Any thoughts?
Thanks,
Vishwesh
Ohhh.. above is 'sudo' issue.
ReplyDelete$sudo jps
28281 Jps
4286 NameNode
4038 DataNode
4417 SecondaryNameNode
3757 RunJar
4593 TaskTracker
is the proper output as expected.
Thanks,
Vishwesh
Hi Vishwesh,
ReplyDeletesorry for late reply,
your jps command is not showing JobTracker,
please check the your logs
Nice post Rahul, helped me quick start really easily. Thanks.
ReplyDeleteThanx Sanjay
ReplyDeleteNice one. Thanks
ReplyDeletehi rahul im getting error wen i am installing java in ubuntu 10.04
ReplyDeletesudo apt-get install sun-java6-jdk
Reading package lists... Done
Building dependency tree
Reading state information... Done
Package sun-java6-jdk is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
E: Package sun-java6-jdk has no installation candidate
you need to add repository by running this command:
Delete$ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
and then update:
$ sudo apt-get update
then install java:
sudo apt-get install sun-java6-jdk
hi rahul i wnt to install cloudera manager on cent os
ReplyDeletepl tell me whether we also need to install cent os on slave also ?
otherwise we can use any o.s system as a slave
thanks in advance
Yes as far as I know, all the machines of your cluster should have same OS
Deleteyou can do all the operation / installation without cloudera manager by manually installing all the components
if you manually install all the components, all the things would be clear to you
thanks rahul
ReplyDeletebut my team lead told to do using cloud manager
is any problem
which is the best & easy way
we must & should install cloud manager into all slaves also otherwise
its jst enough to install in master nly
ur suggestions is helpming me alot
Thanks once again
yes using cloudera manager is very easy easy way to install hadoop
Deleteinstallation of cloudera manager is required only on master
please send me how to use clusters in clouder manager for cent os & also send to my mail how to retrieve data from hbase using clusters in cloudera
ReplyDeleteIts very easy after installation of cloudera manager, since after that it provides UI.
DeleteTo install it run this command:
./scm-installer.bin
Hi
ReplyDeleteOS---------------- Cent OS
which is the best database for hadoop
either hbase or mongodb
which will work using cloudera manager
Selecting Hbase or mongodb depends on your requirements
DeleteUsing cloudera manager you can install only hbase
Hi
ReplyDeletewhy hbase takes timestamp by default? how can we display only values
like
name sai like this