Objective:
This tutorial describes how to install and configure a single-node Hadoop cluster on Ubuntu OS. Single Node Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The tutorial is very simple and to the point, so that you can install Hadoop in 10 Min. Once the installation is done you can perform Hadoop Distributed File System (HDFS) and Hadoop Map-Reduce operations.
Recommended Platform:
- OS: Linux is supported as a development and production platform. You can use Ubuntu 14.04 or later (you can also use other Linux flavors like: CentOS, Redhat, etc.)
- Hadoop: Cloudera Distribution for Apache hadoop CDH5.x (you can use Apache hadoop 2.x)
Prerequisites:
Install Java 7 (Recommended Oracle Java)
Install Python Software Properties
$sudo apt-get install python-software-properties
Add Repository
$sudo add-apt-repository ppa:webupd8team/java
Update the source list
$sudo apt-get update
Install Java
$sudo apt-get install oracle-java7-installer
Configure SSH
Install Open SSH Server-Client
$sudo apt-get install openssh-server openssh-client
Generate Key Pairs
$ssh-keygen -t rsa -P ""
3.2.3 Configure password-less SSH
$cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
Check by SSH to localhost
$ssh localhost
Install Hadoop
Download Hadoop
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz
Untar Tar ball
$tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz
Note: All the required jars, scripts, configuration files, etc. are available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2)