This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Distributed mode(multi node cluster)
Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).
Deploy Cloudera (CDH3) on Cluster:
Prerequisite: Before starting Cloudera in distributed mode you must setup Cloudera in pseudo distributed mode and you need at least two machines one for master and another for slave(you can create more then one virtual machine(cluster) on a single machine).
Deploy Cloudera (CDH3) on Cluster:
COMMAND | DESCRIPTION |
---|---|
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done | Before starting Cloudera in distributed mode first stop each cluster |
or | |
vi /etc/hosts | Then type IP-add master(eg: 192.168.0.1 master) IP-add slave(eg: 192.168.0.2 slave) |
sudo apt-get install openssh-server openssh-client | install ssh |
ssh-keygen -t rsa -P "" | generating rsa key for passwordless ssh |
ssh-copy-id -i $HOME/.ssh/id_rsa.pub slave | setting passwordless ssh |
Now go to your custom directory (conf.cluster) and change configuration files | |
vi masters then erase old contents and type master | masters file defines the namenodes of our multi-node cluster |
vi slaves then erase old contents and type slave | slaves file lists the hosts, one per line, where the Hadoop slave daemons (datanodes and tasktrackers) will be run. |
vi core-site.xml
then type:
<property>
<name>fs.default.name</name>
<value>hdfs://master:54310</value>
</property>
| Edit configuration file core-site.xml |
vi mapred-site.xml
then type:
<property>
<name>mapred.job.tracker</name>
<value>master:54311</value>
</property>
| Edit configuration file mapred-site.xml |
vi hdfs-site.xml then type:
<property>
<name>dfs.replication</name>
<value>1</value></property> | Edit configuration file hdfs-site.xml (value=number of slaves) |
Now copy /etc/hadoop-0.20/conf.cluster directory to all nodes in your cluster
| |
update-alternatives --install /etc/hadoop-0.20/conf hadoop-0.20-conf /etc/hadoop-0.20/conf.cluster 50 | Set alternative rules on all nodes to activate your configuration. |
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done | Restart the daemons on all nodes in your cluster using the service scripts so that the new configuration files are read and then stop them |
You must run the commands on the correct server, according to your role definition | |
/etc/init.d/hadoop-0.20-jobtracker start |
To start the daemons on namenode
on master |
/etc/init.d/hadoop-0.20-datanode start /etc/init.d/hadoop-0.20-tasktracker start | To start the daemons on datanode on slave |
Congratulations Cloudera CDH setup is completed |