This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Distributed mode(multi node cluster)
Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).
Deploy Cloudera (CDH3) on Cluster:
Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).
Deploy Cloudera (CDH3) on Cluster:
COMMAND | DESCRIPTION |
---|---|
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done | Before starting Cloudera in distributed mode first stop each cluster |
or | |
vi /etc/hosts | Then type IP-add master(eg: 192.168.0.1 master) IP-add slave(eg: 192.168.0.2 slave) |
sudo apt-get install openssh-server openssh-client | install ssh |
ssh-keygen -t rsa -P "" | generating rsa key for passwordless ssh |
ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave | setting passwordless ssh |
Now go to your custom directory (conf.cluster) and change configuration files | |
vi masters then erase old contents and type master | masters file defines the namenodes of our multi-node cluster |
vi slaves then erase old contents and type slave | slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will be run. |
vi core-site.xml
then type:
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
| Edit configuration file core-site.xml |
vi mapred-site.xml
then type:
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
| Edit configuration file mapred-site.xml |
vi hdfs-site.xml then type:
<property>
<name>dfs.replication</name>
<value>1</value></property> | Edit configuration file hdfs-site.xml (value=number of slaves) |
Now copy /etc/hadoop-0.20/conf.cluster directory to all nodes in your cluster
| |
update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 | Set alternative rules on all nodes to activate your configuration. |
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done | Restart the daemons on all nodes in your cluster using the service scripts so that the new configuration files are read and then stop them |
You must run the commands on the correct server, according to your role definition | |
/etc/init.d/hadoop-0.20-jobtracker start |
To start the daemons on namenode
on master |
/etc/init.d/hadoop-0.20-datanode start /etc/init.d/hadoop-0.20-tasktracker start | To start the daemons on datanode on slave |
Congratulations Cloudera CDH setup is completed |
Any Feedback and suggestions are invited
ReplyDeleteHi, i am new to hadoop. when i install cloudera manager on ubuntu 10.4,i got an error ;
Delete1st step--chmod a+x cloudera-manager-installer.bin
2nd step--sudo ./cloudera-manager-installer.bin
Then i got an error like this
./cloudera-manager-installer.bin:1.Syntax error ")" unexpected
please give me your suggestion..
Thanking you
Ubuntu is not supported OS for Cloudera manager
Deleteyou can use CentOS
Great Article
DeleteIEEE Projects for CSE in Big Data
Final Year Project Domains for CSE
Hi Rahul,
ReplyDeleteI have configured my Cluster with above specifications. But I am not able test it with any of the example given in pseudo examples.
It is throwing some ACL errors and exceptions.
Any idea?
Thanks,
Vishwesh
Hi Vishwesh,
ReplyDeletedid you enabled acls?
When acls are enabled on the job tracker using the property mapred.acls.enabled, and a job is submitted to a queue name that does not exist in mapred.queue.names property, the following exception is thrown:
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException
Hi Rahul,
ReplyDeleteThanks for your response.
But how to disable acls on CDH Cluster?
Hi Vishwesh,
ReplyDeleteI think The problem you are getting is not the issue of CDH.
Talking about CDH you can Specifies whether ACLs are enabled, and should be checked for various operations by:
mapred.acls.enabled
false
and in ubuntu you can use setfacl command
I did all the settings and got some permission denied problems when I start the job tracker.
ReplyDeleteI got the following error when I try to start the jobtracker.
2011-05-08 10:11:36,200 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://master:54310/tmp/hadoop-mapred/mapred/system) because of permissions.
2011-05-08 10:11:36,200 WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user 'mapred'
2011-05-08 10:11:36,201 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ...
org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=mapred, access=WRITE, inode="":hdfs:supergroup:rwxr-xr-x
I think I need to setup the following properties in the mapred-site.xml
Can you give us an example of the following,
mapred.local.dir
Determines where temporary MapReduce data is written. It also may be a list of directories.
mapred.map.tasks
As a rule of thumb, use 10x the number of slaves (i.e., number of tasktrackers).
mapred.reduce.tasks
As a rule of thumb, use 2x the number of slave processors (i.e., number of tasktrackers).
Tnx for posting it.
ReplyDeleteWhy don't you use master machine as datanode and jobtracker too? But of course, it's reasonable if you have less than 10 machines(roughly) in cluster.
You can use master as slave by starting datanode and tasktracker daemons on master and putting entry in slaves.
ReplyDeleteIt depends on requirements to add master as slave or not.
If you have small cluster then you can do this, but if cluster size is large you should not add master as slave
sorry but maybe could you be more specific the about the configuration where the set up must do on master only, slave only or both of master n slave.
ReplyDeletesorry i'm still new on hadoop :D
Your blog has helped me a lot. Thank you very much.
ReplyDeleteOne addition:
1. you need to move the hosts file to your slave. if your network doesnt identity machines "master" or "slave"
also dfs.replication property means how many copies you want to have of your data. if you have 10gb data to be uploaded to dfs then you would need in total of 30gb dfs space (in total on all your datanodes)
Deleteyou can also use IP ADD at the place of "master" and "slave"
Deleteyes 30 GB of dfs space is required for 10 GB of data, if replication factor is 3
DeleteHi, i am new to hadoop. when i install cloudera manager on ubuntu 10.4,i got an error ;
ReplyDelete1st step--chmod a+x cloudera-manager-installer.bin
2nd step--sudo ./cloudera-manager-installer.bin
Then i got an error like this
./cloudera-manager-installer.bin:1.Syntax error ")" unexpected
please give me your suggestion..
Thanking you
Hi,
Deletecloudera manager does not work on Ubuntu
It can work on CentOS
but if you want to install hadoop on ubuntu, you can install it without cloudra manager, by following above steps
Thanks Rahul,
Deletecan u please tell me how we access HBase using Hadoopclusters in cloudera manager for redhat linux.
Please send me materials or any stuff related to above.
My mail id is : vaddi.ramu33@gmail.com
Pl send suggestions also for me as i am new to HADOOP.
Hi,
Deletefor hbase you can refer
http://ankitasblogger.blogspot.in/2011/01/installing-hbase-in-cluster-complete.html
hi rahul,
ReplyDeletei am unable to setup cloudera manager, so can u suggest me any tutorials about clouderamanager except cloudera.com
What problems you are getting ??
Deletei installed successfully.
ReplyDeleteafter that i done clientconfiguration in clouderaagents
and in clouderaserver host
when i enterd localhost:50070
it shows name node
after that when i enterd localhost:60010
it showing "page not Found Error"
please can give some sought of suggestions to clear this error.
what version of eclipse can i install on centOS5.3 to do projects using phython
As far as I know no services run on localhost:60010
DeleteServices mainly run on
50070
50060
50030
hi how can we set client configuration please tell me briefly with an eg
ReplyDeleteI am not able to get what configuration you are asking
DeleteHi Rahul,
ReplyDeleteI am working on Hadoop and my team members are all freshers to hadoop.
Can u suggest some books which will be easy to code
and also send some site & books which are related to HBase commands
I think "Hadoop_The_Definitive_Guide_Cr.pdf" is good book for startup
Deletealso lots of contents available on net
Hi Rahul,
ReplyDeleteI Installed cloudera manager sucessfully.
After that when i am check the status of hbase master. I got following error
Traceback (most recent call last):
File "/usr/lib64/cmf/agent/src/cmf/monitor/master/__init__.py", line 87, in collect
json = simplejson.load(urllib2.urlopen(self._metrics_url))
File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen
return _opener.open(url, data)
File "/usr/lib64/python2.4/urllib2.py", line 358, in open
response = self._open(req, data)
File "/usr/lib64/python2.4/urllib2.py", line 376, in _open
'_open', req)
File "/usr/lib64/python2.4/urllib2.py", line 337, in _call_chain
result = func(*args)
File "/usr/lib64/python2.4/urllib2.py", line 1032, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib64/python2.4/urllib2.py", line 1006, in do_open
raise URLError(err)
URLError:
Hi Rahul
ReplyDeletewhen i check localhost:50070
it showing that nodes as deadnodes
in Live nodes it showing as 0
pl resolve problem
I think your datanode daemons are not running, please check logs
ReplyDeleteIf datanode is running then please run following commands:
$ bin/hadoop dfsadmin -refreshNodes
$ bin/hadoop fsck /
hi rahul
Deletei want to chat with u please come to gmail & accept my chat request
i have so many doubts .Its very urgent we are struggling alot
Pls help me out
Ok mail me your gmail Id
DeleteHi rahul,
ReplyDeleteI started cloudera manager. It showing all nodes are in good state
but wen i am trying to view status of particular datanode or hbase region server
it getting error
& showing all nodes as a dead nodes
Please tell me how to configure the client configuration
I dont use cloudera manager that much, I feel comfortable in manually installation, so cannot say any thing what or why cloudera manager is giving different result.
DeleteWhat result hadoop Web interface is giving I feel thats correct
also please scan your log files to find exact problem
Hi Rahul,
ReplyDeletehow to set /etc/hosts file
I configured as
In Master System:
127.0.0.1 localhost
192.168.1.13 hadoop1.com
192.168.1.12 hadoop2.com
192.168.1.16 hadoop4.com
192.168.1.49 hadoop3.com
In Slave System------hadoop2.com
127.0.0.1 localhost
192.168.1.12 hadoop2.com
192.168.1.13 hadoop1.com
when i trying following command
host -v -t A 'hadoop1.com'
It taking global IP instead of local IP
Pls resolve this
Its not compulsory to put entry in /etc/hosts
ReplyDeleteIts just for your convenience
if you put 192.168.1.13 hadoop1.com on a node, then run following command to check it:
ping hadoop1.com
or ssh hadoop1.com
Hi rahul
ReplyDeleteThanks a lot
when i am connected hbase with one slave & at the same time when i am trying to connect hbase with another slave it showing error in 2nd slave as
INFO ipc:HbaceRPC: server at phxl-ss-2-lb.cnet.com/64.30.224.1
could not be reached
But it working on slave 1
we didnt have ip as 64.30.224.1 in any of our system
Hi Rahul
ReplyDeletei am getting FATAL Error In hbase service of one client
How to check the data in hbase is distributed or not.
Pls Help me out
for HBase related queries please posts comments in
Deletehttp://ankitasblogger.blogspot.in/2011/01/installing-hbase-in-cluster-complete.html
Hi Rahul
ReplyDeletethanks for helping me in cluster setup.
I am thinking to do map reduce programs in python.
How to do & what are the resources will be use.
Pls send me any materials regarding MAPREDUCE IN PYTHON.
Hi,
ReplyDeleteI notice that there are some steps in
https://ccp.cloudera.com/display/CDHDOC/CDH3+Deployment+on+a+Cluster#CDH3DeploymentonaCluster-ConfigurationFilesandProperties
that you do not include in your tutorial. (An example is the configuration of local storage directories. Another difference is that the Cloudera info indicates that one must use fully-qualified domain names which your tutorial does not seem to require.)
Can you comment upon these steps? Are they not required but nice to have?
Hi,
ReplyDeleteThe configuration specified above are minimum required,
But yes some of the configuration parameters specified on the link you gave, should be considered, its for the good practices.
Thanks for pointing out this, I will update the tutorial
Hi Rahul,
ReplyDeleteI am facing problem in starting hbase-master in CDH-4 YARN. the webpage localhost:60010 is not opening. I have followed installation procedure as mentioned in CDH4 installation guide from cloudera for standalone pseudo mode. Once I have purged hbase and may be there are some configuration change when I reinstalled it. Then it was working well.
Took me time to read all the comments, but I really enjoyed the article. It proved to be Very helpful to me and I am sure to all the commenters here! Its always nice when you can not only be informed, but also entertained! Im sure you had fun writing this article.
ReplyDeleteHadoop online training
Hey very nice blog!!
ReplyDeleteHi there,I enjoy reading through your article post, I wanted to write a little comment to support you and wish you a good
continuationAll the best for all your blogging efforts.
Appreciate the recommendation! Let me try it out.
Keep working ,great job!
Hadoop training
Hadoop is a open source framework which is written in java by apche
ReplyDeletesoftware foundation.Hadoop Tutorial
Excellent information.i have learn to this info.Thank you so much.
ReplyDeleteHadoop Training in Chennai
Nice and good article. It is very useful for me to learn and understand easily. Thanks for sharing your valuable information and time. Please keep updating Hadoop Admin Online Training Hyderabad
ReplyDeleteCongenital Diaphragmatic Hernia
ReplyDeleteThe diaphragm typically forms during the first eight weeks of pregnancy. In CDH patients, the size of the hole in the diaphragm will determine how much a baby’s lungs, heart, and other internal organs will be affected.
perde modelleri
ReplyDeletesms onay
Vodafone Mobil Ödeme Bozdurma
NFTNASİLALİNİR.COM
ANKARA EVDEN EVE NAKLİYAT
trafik sigortası
DEDEKTOR
WEBSİTESİ KURMA
Aşk romanları
Smm panel
ReplyDeleteSmm Panel
iş ilanları
instagram takipçi satın al
Hirdavatci
Https://www.beyazesyateknikservisi.com.tr/
servis
Tiktok Hile
çekmeköy toshiba klima servisi
ReplyDeleteüsküdar lg klima servisi
beykoz alarko carrier klima servisi
ataşehir toshiba klima servisi
çekmeköy beko klima servisi
ataşehir beko klima servisi
maltepe lg klima servisi
kadıköy lg klima servisi
kartal toshiba klima servisi
en son çıkan perde modelleri
ReplyDeletelisans satın al
yurtdışı kargo
özel ambulans
en son çıkan perde modelleri
minecraft premium
uc satın al
nft nasıl alınır