Cloudera Distribution for Hadoop (CDH): Running Cloudera in Distributed Mode

Running Cloudera in Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Distributed mode(multi node cluster)

Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).

Deploy Cloudera (CDH3) on Cluster:

COMMAND	DESCRIPTION
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done	Before starting Cloudera in distributed mode first stop each cluster
update-alternatives --display hadoop-0.20-conf	To list alternative Hadoop configurations on Your system
cp -r /etc/hadoop-0.20/conf.empty /etc/hadoop-0.20/conf.cluster	Copy the default configuration to your custom directory
update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50	To activate the new configuration on Your systems
update-alternatives --display hadoop-0.20-conf	To Check the new configuration on Your systems
or update-alternatives --set hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster	To manually set the configuration
vi /etc/hosts	Then type IP-add master(eg: 192.168.0.1 master) IP-add slave(eg: 192.168.0.2 slave)
sudo apt-get install openssh-server openssh-client	install ssh
ssh-keygen -t rsa -P ""	generating rsa key for passwordless ssh
ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave	setting passwordless ssh
Now go to your custom directory (conf.cluster) and change configuration files
vi masters then erase old contents and type master	masters file defines the namenodes of our multi-node cluster
vi slaves then erase old contents and type slave	slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will be run.
vi core-site.xml then type: <property> <name>fs.default.name</name> <value>hdfs://master:54310</value> </property>	Edit configuration file core-site.xml
vi mapred-site.xml then type: <property> <name>mapred.job.tracker</name> <value>master:54311</value> </property>	Edit configuration file mapred-site.xml
vi hdfs-site.xml then type: <property> <name>dfs.replication</name> <value>1</value> </property>	Edit configuration file hdfs-site.xml (value=number of slaves)
Now copy /etc/hadoop-0.20/conf.cluster directory to all nodes in your cluster
update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50	Set alternative rules on all nodes to activate your configuration.
for service in /etc/init.d/hadoop-0.20-; do sudo $service start; done for x in /etc/init.d/hadoop- ; do sudo $x stop ; done	Restart the daemons on all nodes in your cluster using the service scripts so that the new configuration files are read and then stop them
su -s /bin/bash - hdfs -c 'hadoop namenode -format'	Format namenode manually(Before starting namenode)
You must run the commands on the correct server, according to your role definition
/etc/init.d/hadoop-0.20-namenode start /etc/init.d/hadoop-0.20-secondarynamenode start /etc/init.d/hadoop-0.20-jobtracker start	To start the daemons on namenode on master
/etc/init.d/hadoop-0.20-datanode start /etc/init.d/hadoop-0.20-tasktracker start	To start the daemons on datanode on slave
Congratulations Cloudera CDH setup is completed

54 comments:

RahulNovember 21, 2010 at 10:34 AM
Any Feedback and suggestions are invited
ReplyDelete
Replies
VishweshJanuary 17, 2011 at 6:10 AM
Hi Rahul,

I have configured my Cluster with above specifications. But I am not able test it with any of the example given in pseudo examples.
It is throwing some ACL errors and exceptions.
Any idea?

Thanks,
Vishwesh
ReplyDelete
Replies
RahulJanuary 17, 2011 at 10:17 PM
Hi Vishwesh,
did you enabled acls?
When acls are enabled on the job tracker using the property mapred.acls.enabled, and a job is submitted to a queue name that does not exist in mapred.queue.names property, the following exception is thrown:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException
ReplyDelete
Replies
VishweshJanuary 18, 2011 at 1:24 AM
Hi Rahul,

Thanks for your response.
But how to disable acls on CDH Cluster?
ReplyDelete
Replies
RahulJanuary 18, 2011 at 2:40 AM
Hi Vishwesh,
I think The problem you are getting is not the issue of CDH.
Talking about CDH you can Specifies whether ACLs are enabled, and should be checked for various operations by:

mapred.acls.enabled
false

and in ubuntu you can use setfacl command
ReplyDelete
Replies
UnknownMay 8, 2011 at 12:23 PM
I did all the settings and got some permission denied problems when I start the job tracker.

I got the following error when I try to start the jobtracker.

2011-05-08 10:11:36,200 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://master:54310/tmp/hadoop-mapred/mapred/system) because of permissions.
2011-05-08 10:11:36,200 WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user 'mapred'
2011-05-08 10:11:36,201 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="":hdfs:supergroup:rwxr-xr-x

I think I need to setup the following properties in the mapred-site.xml
Can you give us an example of the following,
mapred.local.dir
Determines where temporary MapReduce data is written. It also may be a list of directories.
mapred.map.tasks
As a rule of thumb, use 10x the number of slaves (i.e., number of tasktrackers).
mapred.reduce.tasks
As a rule of thumb, use 2x the number of slave processors (i.e., number of tasktrackers).
ReplyDelete
Replies
icemMay 11, 2011 at 4:42 PM
Tnx for posting it.
Why don't you use master machine as datanode and jobtracker too? But of course, it's reasonable if you have less than 10 machines(roughly) in cluster.
ReplyDelete
Replies
RahulMay 12, 2011 at 8:46 AM
You can use master as slave by starting datanode and tasktracker daemons on master and putting entry in slaves.

It depends on requirements to add master as slave or not.
If you have small cluster then you can do this, but if cluster size is large you should not add master as slave
ReplyDelete
Replies
iGoyOctober 5, 2011 at 7:17 AM
sorry but maybe could you be more specific the about the configuration where the set up must do on master only, slave only or both of master n slave.
sorry i'm still new on hadoop :D
ReplyDelete
Replies
UnknownJanuary 20, 2012 at 4:41 AM
Your blog has helped me a lot. Thank you very much.

One addition:
1. you need to move the hosts file to your slave. if your network doesnt identity machines "master" or "slave"
ReplyDelete
Replies
vaddiApril 10, 2012 at 6:52 AM
Hi, i am new to hadoop. when i install cloudera manager on ubuntu 10.4,i got an error ;

1st step--chmod a+x cloudera-manager-installer.bin

2nd step--sudo ./cloudera-manager-installer.bin

Then i got an error like this

./cloudera-manager-installer.bin:1.Syntax error ")" unexpected

please give me your suggestion..

Thanking you
ReplyDelete
Replies
vaddiApril 13, 2012 at 10:53 AM
hi rahul,

i am unable to setup cloudera manager, so can u suggest me any tutorials about clouderamanager except cloudera.com
ReplyDelete
Replies
vaddiApril 14, 2012 at 3:26 AM
i installed successfully.
after that i done clientconfiguration in clouderaagents
and in clouderaserver host
when i enterd localhost:50070
it shows name node
after that when i enterd localhost:60010
it showing "page not Found Error"
please can give some sought of suggestions to clear this error.
what version of eclipse can i install on centOS5.3 to do projects using phython
ReplyDelete
Replies
vaddiApril 14, 2012 at 4:11 AM
hi how can we set client configuration please tell me briefly with an eg
ReplyDelete
Replies
vaddiApril 14, 2012 at 4:16 AM
Hi Rahul,
I am working on Hadoop and my team members are all freshers to hadoop.
Can u suggest some books which will be easy to code
and also send some site & books which are related to HBase commands
ReplyDelete
Replies
vaddiApril 15, 2012 at 11:36 PM
Hi Rahul,
I Installed cloudera manager sucessfully.

After that when i am check the status of hbase master. I got following error

Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/master/__init__.py", line 87, in collect
json = simplejson.load(urllib2.urlopen(self._metrics_url))
File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "/usr/lib64/python2.4/urllib2.py", line 358, in open
response = self._open(req, data)
File "/usr/lib64/python2.4/urllib2.py", line 376, in _open
'_open', req)
File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain
result = func(*args)
File "/usr/lib64/python2.4/urllib2.py", line 1032, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.4/urllib2.py", line 1006, in do_open
raise URLError(err)
URLError:
ReplyDelete
Replies
vaddiApril 15, 2012 at 11:38 PM
Hi Rahul
when i check localhost:50070

it showing that nodes as deadnodes
in Live nodes it showing as 0
pl resolve problem
ReplyDelete
Replies
RahulApril 16, 2012 at 5:03 AM
I think your datanode daemons are not running, please check logs

If datanode is running then please run following commands:

$ bin/hadoop dfsadmin -refreshNodes
$ bin/hadoop fsck /
ReplyDelete
Replies
vaddiApril 16, 2012 at 5:43 AM
Hi rahul,

I started cloudera manager. It showing all nodes are in good state
but wen i am trying to view status of particular datanode or hbase region server

it getting error
& showing all nodes as a dead nodes

Please tell me how to configure the client configuration
ReplyDelete
Replies
vaddiApril 17, 2012 at 7:15 AM
Hi Rahul,
how to set /etc/hosts file

I configured as
In Master System:

127.0.0.1 localhost
192.168.1.13 hadoop1.com
192.168.1.12 hadoop2.com
192.168.1.16 hadoop4.com
192.168.1.49 hadoop3.com

In Slave System------hadoop2.com

127.0.0.1 localhost
192.168.1.12 hadoop2.com
192.168.1.13 hadoop1.com

when i trying following command

host -v -t A 'hadoop1.com'

It taking global IP instead of local IP

Pls resolve this
ReplyDelete
Replies
RahulApril 18, 2012 at 12:36 AM
Its not compulsory to put entry in /etc/hosts
Its just for your convenience

if you put 192.168.1.13 hadoop1.com on a node, then run following command to check it:
ping hadoop1.com
or ssh hadoop1.com
ReplyDelete
Replies
vaddiApril 18, 2012 at 3:09 AM
Hi rahul

Thanks a lot

when i am connected hbase with one slave & at the same time when i am trying to connect hbase with another slave it showing error in 2nd slave as

INFO ipc:HbaceRPC: server at phxl-ss-2-lb.cnet.com/64.30.224.1
could not be reached

But it working on slave 1

we didnt have ip as 64.30.224.1 in any of our system
ReplyDelete
Replies
vaddiApril 19, 2012 at 12:46 AM
Hi Rahul

i am getting FATAL Error In hbase service of one client

How to check the data in hbase is distributed or not.

Pls Help me out
ReplyDelete
Replies
vaddiApril 23, 2012 at 8:17 AM
Hi Rahul

thanks for helping me in cluster setup.

I am thinking to do map reduce programs in python.

How to do & what are the resources will be use.

Pls send me any materials regarding MAPREDUCE IN PYTHON.
ReplyDelete
Replies
UnknownAugust 19, 2012 at 6:39 PM
Hi,

I notice that there are some steps in

https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster#CDH3DeploymentonaCluster-ConfigurationFilesandProperties

that you do not include in your tutorial. (An example is the configuration of local storage directories. Another difference is that the Cloudera info indicates that one must use fully-qualified domain names which your tutorial does not seem to require.)

Can you comment upon these steps? Are they not required but nice to have?
ReplyDelete
Replies
RahulAugust 21, 2012 at 2:52 AM
Hi,
The configuration specified above are minimum required,
But yes some of the configuration parameters specified on the link you gave, should be considered, its for the good practices.

Thanks for pointing out this, I will update the tutorial
ReplyDelete
Replies
SushantMay 26, 2013 at 10:15 AM
Hi Rahul,
I am facing problem in starting hbase-master in CDH-4 YARN. the webpage localhost:60010 is not opening. I have followed installation procedure as mentioned in CDH4 installation guide from cloudera for standalone pseudo mode. Once I have purged hbase and may be there are some configuration change when I reinstalled it. Then it was working well.
ReplyDelete
Replies
magnifictrainingJuly 24, 2013 at 2:17 AM
Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! Its always nice when you can not only be informed, but also entertained! Im sure you had fun writing this article.

Hadoop online training
ReplyDelete
Replies
magnifictrainingAugust 6, 2013 at 3:43 AM
Hey very nice blog!!
Hi there,I enjoy reading through your article post, I wanted to write a little comment to support you and wish you a good

continuationAll the best for all your blogging efforts.
Appreciate the recommendation! Let me try it out.
Keep working ,great job!
Hadoop training
ReplyDelete
Replies
UnknownMarch 14, 2014 at 12:20 AM
Hadoop is a open source framework which is written in java by apche
software foundation.Hadoop Tutorial
ReplyDelete
Replies
AnonymousJuly 15, 2014 at 1:45 AM
Excellent information.i have learn to this info.Thank you so much.
Hadoop Training in Chennai
ReplyDelete
Replies
UnknownOctober 3, 2018 at 5:35 AM
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Admin Online Training Hyderabad
ReplyDelete
Replies
PandaGeneralMarch 21, 2022 at 2:41 AM
Congenital Diaphragmatic Hernia
The diaphragm typically forms during the first eight weeks of pregnancy. In CDH patients, the size of the hole in the diaphragm will determine how much a baby’s lungs, heart, and other internal organs will be affected.
ReplyDelete
Replies
AnonymousMay 18, 2022 at 12:25 AM
perde modelleri
sms onay
Vodafone Mobil Ödeme Bozdurma
NFTNASİLALİNİR.COM
ANKARA EVDEN EVE NAKLİYAT
trafik sigortası
DEDEKTOR
WEBSİTESİ KURMA
Aşk romanları
ReplyDelete
Replies
AnonymousMay 30, 2022 at 5:51 AM
Smm panel
Smm Panel
iş ilanları
instagram takipçi satın al
Hirdavatci
Https://www.beyazesyateknikservisi.com.tr/
servis
Tiktok Hile
ReplyDelete
Replies
AnonymousJune 3, 2022 at 10:39 AM
çekmeköy toshiba klima servisi
üsküdar lg klima servisi
beykoz alarko carrier klima servisi
ataşehir toshiba klima servisi
çekmeköy beko klima servisi
ataşehir beko klima servisi
maltepe lg klima servisi
kadıköy lg klima servisi
kartal toshiba klima servisi
ReplyDelete
Replies
AnonymousJune 27, 2022 at 3:05 PM
en son çıkan perde modelleri
lisans satın al
yurtdışı kargo
özel ambulans
en son çıkan perde modelleri
minecraft premium
uc satın al
nft nasıl alınır
ReplyDelete
Replies

Add comment