Hadoop 1.2.1 Installation and Configuration on Single Node

14/12/2016

I have experienced difficulties in installing and configuring Hadoop so I want to make one easy guide for installation and configuration of Hadoop 1x. I am assuming that readers have a knowledge of basic Linux commands so i am not going to explain those commands in deep.

I have used Hadoop-1.2.1, jdk-7 and Ubuntu(Linux) in our setup.

Install SSH:

We require SSH for remote login to different machines for Map Reduce task to run on Hadoop cluster

Commands:

sudo apt-get update
- updates list of packages
sudo apt-get install openssh-server
- Installs OpenSSH Server

Generate Keys:

Hadoop logged in to remote machines many times while running a Map Reduce task. Therefore, we need to make password less entry for Hadoop to all the nodes in our cluster.

Commands:

ssh
- in write your system’s host name. It asks for password
ssh-keygen
- generates SSH Keys
ENTER FILE NAME:
- no need to write anything simply press enter as we want to use default settings
ENTER PASSPHRASE:
- no need to write anything simply press enter as we want to use default settings
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- Copy id_rsa.pub key to authorized keys to make password less entry for user
ssh
- Now it should not ask for password

Install Java

I prefer offline Java installation. So I have already downloaded Java tar ball and place it in my Downloads directory

Commands:

sudo mkdir -p /usr/lib/jvm/
- create directory for Java
sudo tar xvf ~/Downloads/jdk-7u67-linux-x64.tar.gz -C /usr/lib/jvm
- extract and copy content of Java tar ball to Java directory
cd /usr/lib/jvm
- go to Java directory
sudo ln -s jdk1.7.0_67 java-1.7.0-sun-amd64
- generate symbolic link to jdk directory which will be used in Hadoop configuration
sudo update-alternatives –config java
- checking and setting Java alternatives
sudo nano $HOME/.bashrc
- setting Java path. Add following two lines at the end of this file
  - export JAVA_HOME=”/usr/lib/jvm/jdk1.7.0_67″
  - export PATH=”$PATH:$JAVA_HOME/bin”
exec bash
- restarts bash(terminal)
java
- it should not show command not found error!

Install Hadoop:

First we need to download required tar ball of Hadoop and place it in to home directory.

Commands:

sudo mkdir -p /usr/local/hadoop/
- create Hadoop directory
sudo tar xvf ~/hadoop-1.2.1-bin.tar.gz
sudo cp -r ~/hadoop-1.2.1/* /usr/local/hadoop
- extract and copy Hadoop files from tar ball to Hadoop directory
sudo nano $HOME/.bashrc
- setting Hadoop path. Add following lines at the end of this file
  - export HADOOP_PREFIX=/usr/local/hadoop
  - export PATH=$PATH:$HADOOP_PREFIX/bin
exec bash
- restarts bash(terminal)
hadoop
- it should not show command not found error!

Configuration of Hadoop

We are setting some environment variables and changing some configuration file according to our cluster setup.

Commands:

cd /usr/local/hadoop/conf
- go to configuration directory of Hadoop
sudo nano hadoop-env.sh
- open environment variable file and add following two lines at its respective place in file. They are already available in file with some different values so keep it as it is and add this lines after it respectively
  - export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-amd64
  - export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
sudo nano core-site.xml
- open HDFS configuration file and set name server address and tmp dir value. you have to use your hostname instead of “coed161″fs.default.name
  hdfs://coed161:10001
  hadoop.tmp.dir
  /usr/local/hadoop/tmp
sudo nano mapred-site.xml
- Open Map Reduce configuration file and set job tracker value. you have to write your host name instead of “coed161″mapred.job.tracker
  coed161:10002
sudo mkdir /usr/local/hadoop/tmp
- create tmp directory to store all files on data node
sudo chown /usr/local/hadoop/tmp
- change owner of the directory to avoid access control issues. write your username instead of
sudo chown /usr/local/hadoop
- change owner of the directory to avoid access control issues. write your username instead of

Format DFS (skip this step if you are going for Multi-Node setup):

Now we are ready to format our Distributed File System (DFS)

Command:

hadoop namenode -format
- Check for the message “namenode successfully formatted”

Start all process:

We are ready to start our hadoop cluster (though only single node)

Commands:

start-all.sh
- to start all (name node, secondary name node, data node, job tracker, task tracker)
jps
- to check whether all services (i.e. name node, secondary name node, data node, job tracker, task tracker) started or not

To check cluster details on web interface:

Open any browser and got following address:
- http://coed161:50070/dfshealth.jsp
  - for DFS (name node) details. Write your host name instead of “coed161”
- http://coed161:50030/jobtracker.jsp
  - for Map Reduce (job tracker) details. Write your host name instead of “coed161”

Stop all processes: