Running Cloudera in Pseudo Distributed Mode

This section contains instructions for Cloudera Distribution for Hadoop (CDH3) installation on ubuntu. This is CDH quickstart tutorial to setup Cloudera Distribution for Hadoop (CDH3) quickly on debian systems. This is shortest tutorial of Cloudera installation, here you will get all the commands and their description required to install Cloudera in Pseudo distributed mode(single node cluster)


Deploy Cloudera (CDH3) in Pseudo Distributed mode:
COMMAND DESCRIPTION
$ sudo add-apt-repository 
"deb http://archive.canonical.com/ lucid partner"
If you are using ubuntu 10.04 LTS run this command 
sudo apt-get install sun-java6-jdk Install java
lsb_release –c Name of the your distribution (let DISTRO)(eg: hardy or jaunty etc.)
vi /etc/apt/sources.list.d/cloudera.list
Then type:
deb http://archive.cloudera.com/debian DISTRO-cdh3 contrib
deb-src http://archive.cloudera.com/debian DISTRO-cdh3 contrib
A repository enables your package manager to install cloudera
replace DISTRO with the name of your distribution
sudo apt-get -y install curl install curl
curl -s http://archive.cloudera.com/debian/archive.key | sudo apt-key add - Add a repository key. Add the Cloudera Public GPG Key to your repository
sudo apt-get update Update APT package index
sudo apt-get -y install hadoop-0.20-conf-pseudo Install Hadoop in pseudo-distributed mode:
A pseudo-distributed Hadoop installation is composed of one node running all five Hadoop daemons: namenode, jobtracker, secondarynamenode, datanode, and tasktracker
for service in /etc/init.d/hadoop-0.20-*; do sudo $service start; done Start the Cloudera Daemons
dpkg -L hadoop-0.20-conf-pseudo Viewing the files on Debian systems
jps It should give output like this:
14799 NameNode
14977 SecondaryNameNode
15183 DataNode
15596 JobTracker
15897 TaskTracker
Congratulations Cloudrea Setup is Completed. Now lets run some examples
hadoop jar /usr/lib/hadoop/hadoop-*-examples.jar pi 10 100 run pi example
hadoop fs -mkdir input
hadoop fs -put /etc/hadoop-0.20/conf/*.xml input
hadoop-0.20 fs -ls input
hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
run grep example
hadoop-0.20 fs -mkdir inputwords
hadoop-0.20 fs -put /etc/hadoop-0.20/conf/*.xml inputwords
hadoop-0.20 fs -ls inputwords
hadoop-0.20 jar /usr/lib/hadoop-0.20/hadoop-*-examples.jar grep inputwords outputwords 'dfs[a-z.]+'
run word count example
http://localhost:50070/ web based interface for name node
http://localhost:50030/ web based interface for Job tracker
for x in /etc/init.d/hadoop-* ; do sudo $x stop ; done Shutdown CDH2 Hadoop services

22 comments:

  1. Hi, I'm very new to linux and would need some additional help to install hadoop. When I enter the command: "vi /etc/apt/sources.list.d", my screen fills up with "~". Can you please help?
    Thanks!
    Ghislain

    ReplyDelete
  2. hi Ghis you are trying to open a directory in vi try this command:
    vi /etc/apt/sources.list.d/cloudera.list
    it will open a file named cloudera.list,
    then you go in insert mode and type as specified in above tutorial.

    ReplyDelete
  3. Hi all,
    please run this command before starting on ubuntu 10.04 LTS:
    $ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"

    $ sudo apt-get update

    ReplyDelete
  4. Thanks a lot.

    I was facing problem on Ubuntu 'Natty'. Now I tried this for Lucid and it worked perfectly for me. :)

    Thanks,
    Vishwesh

    ReplyDelete
  5. Rahul, On my system 'jps' command is not showing output you have shown above. It is showing output as given below:

    $jps
    28217 Jps

    But important thing is Everything is working fine. I can run Example n all.

    Any thoughts?

    Thanks,
    Vishwesh

    ReplyDelete
  6. Ohhh.. above is 'sudo' issue.

    $sudo jps
    28281 Jps
    4286 NameNode
    4038 DataNode
    4417 SecondaryNameNode
    3757 RunJar
    4593 TaskTracker

    is the proper output as expected.

    Thanks,
    Vishwesh

    ReplyDelete
  7. Hi Vishwesh,
    sorry for late reply,
    your jps command is not showing JobTracker,
    please check the your logs

    ReplyDelete
  8. Nice post Rahul, helped me quick start really easily. Thanks.

    ReplyDelete
  9. hi rahul im getting error wen i am installing java in ubuntu 10.04
    sudo apt-get install sun-java6-jdk
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    Package sun-java6-jdk is not available, but is referred to by another package.
    This may mean that the package is missing, has been obsoleted, or
    is only available from another source
    E: Package sun-java6-jdk has no installation candidate

    ReplyDelete
    Replies
    1. you need to add repository by running this command:
      $ sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
      and then update:
      $ sudo apt-get update
      then install java:
      sudo apt-get install sun-java6-jdk

      Delete
  10. hi rahul i wnt to install cloudera manager on cent os
    pl tell me whether we also need to install cent os on slave also ?
    otherwise we can use any o.s system as a slave


    thanks in advance

    ReplyDelete
    Replies
    1. Yes as far as I know, all the machines of your cluster should have same OS
      you can do all the operation / installation without cloudera manager by manually installing all the components
      if you manually install all the components, all the things would be clear to you

      Delete
  11. thanks rahul
    but my team lead told to do using cloud manager
    is any problem

    which is the best & easy way

    we must & should install cloud manager into all slaves also otherwise
    its jst enough to install in master nly


    ur suggestions is helpming me alot
    Thanks once again

    ReplyDelete
    Replies
    1. yes using cloudera manager is very easy easy way to install hadoop

      installation of cloudera manager is required only on master

      Delete
  12. please send me how to use clusters in clouder manager for cent os & also send to my mail how to retrieve data from hbase using clusters in cloudera

    ReplyDelete
    Replies
    1. Its very easy after installation of cloudera manager, since after that it provides UI.
      To install it run this command:
      ./scm-installer.bin

      Delete
  13. Hi

    OS---------------- Cent OS

    which is the best database for hadoop
    either hbase or mongodb

    which will work using cloudera manager

    ReplyDelete
    Replies
    1. Selecting Hbase or mongodb depends on your requirements

      Using cloudera manager you can install only hbase

      Delete
  14. Hi
    why hbase takes timestamp by default? how can we display only values
    like
    name sai like this

    ReplyDelete