Skip to content

Hadoop 1.2.1 Installation and Configuration on Single Node

14/12/2016

I have experienced difficulties in installing and configuring Hadoop so I want to make one easy guide for installation and configuration of Hadoop 1x. I am assuming that readers have a knowledge of basic Linux commands so i am not going to explain those commands in deep.

I have used Hadoop-1.2.1, jdk-7 and Ubuntu(Linux) in our setup.

Install SSH:

  • We require SSH for remote login to different machines for Map Reduce task to run on Hadoop cluster

Commands:

  • sudo apt-get update
    • updates list of packages
  • sudo apt-get install openssh-server
    • Installs OpenSSH Server

Generate Keys:

  • Hadoop logged in to remote machines many times while running a Map Reduce task. Therefore, we need to make password less entry for Hadoop to all the nodes in our cluster.

Commands:

  • ssh
    • in write your system’s host name. It asks for password
  • ssh-keygen
    • generates SSH Keys
  • ENTER FILE NAME:
    • no need to write anything simply press enter as we want to use default settings
  • ENTER PASSPHRASE:
    • no need to write anything simply press enter as we want to use default settings
  • cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    • Copy id_rsa.pub key to authorized keys to make password less entry for user
  • ssh
    • Now it should not ask for password

Install Java

  • I prefer offline Java installation. So I have already downloaded Java tar ball and place it in my Downloads directory

Commands:

  • sudo mkdir -p /usr/lib/jvm/
    • create directory for Java
  • sudo tar xvf ~/Downloads/jdk-7u67-linux-x64.tar.gz -C /usr/lib/jvm
    • extract and copy content of Java tar ball to Java directory
  • cd /usr/lib/jvm
    • go to Java directory
  • sudo ln -s jdk1.7.0_67 java-1.7.0-sun-amd64
    • generate symbolic link to jdk directory which will be used in Hadoop configuration
  • sudo update-alternatives –config java
    • checking and setting Java alternatives
  • sudo nano $HOME/.bashrc
    • setting Java path. Add following two lines at the end of this file
      • export JAVA_HOME=”/usr/lib/jvm/jdk1.7.0_67″
      • export PATH=”$PATH:$JAVA_HOME/bin”
  • exec bash
    • restarts bash(terminal)
  • java
    • it should not show command not found error!

Install Hadoop:

  • First we need to download required tar ball of Hadoop and place it in to home directory.

Commands:

  • sudo mkdir -p /usr/local/hadoop/
    • create Hadoop directory
  • sudo tar xvf ~/hadoop-1.2.1-bin.tar.gz
  • sudo cp -r ~/hadoop-1.2.1/* /usr/local/hadoop
    • extract and copy Hadoop files from tar ball to Hadoop directory
  • sudo nano $HOME/.bashrc
    • setting Hadoop path. Add following lines at the end of this file
      • export HADOOP_PREFIX=/usr/local/hadoop
      • export PATH=$PATH:$HADOOP_PREFIX/bin
  • exec bash
    • restarts bash(terminal)
  • hadoop
    • it should not show command not found error!

Configuration of Hadoop

  • We are setting some environment variables and changing some configuration file according to our cluster setup.

Commands:

  • cd /usr/local/hadoop/conf
    • go to configuration directory of Hadoop
  • sudo nano hadoop-env.sh
    • open environment variable file and add following two lines at its respective place in file. They are already available in file with some different values so keep it as it is and add this lines after it respectively
      • export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-amd64
      • export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
  • sudo nano core-site.xml
    • open HDFS configuration file and set name server address and tmp dir value. you have to use your hostname instead of “coed161″fs.default.name
      hdfs://coed161:10001
      hadoop.tmp.dir
      /usr/local/hadoop/tmp
  • sudo nano mapred-site.xml
    • Open Map Reduce configuration file and set job tracker value. you have to write your host name instead of “coed161″mapred.job.tracker
      coed161:10002
  • sudo mkdir /usr/local/hadoop/tmp
    • create tmp directory to store all files on data node
  • sudo chown /usr/local/hadoop/tmp
    • change owner of the directory to avoid access control issues. write your username instead of
  • sudo chown /usr/local/hadoop
    • change owner of the directory to avoid access control issues. write your username instead of

Format DFS (skip this step if you are going for Multi-Node setup):

  • Now we are ready to format our Distributed File System (DFS)

Command:

  • hadoop namenode -format
    • Check for the message “namenode successfully formatted”

Start all process:

  • We are ready to start our hadoop cluster (though only single node)

Commands:

  • start-all.sh
    • to start all (name node, secondary name node, data node, job tracker, task tracker)
  • jps
    • to check whether all services (i.e. name node, secondary name node, data node, job tracker, task tracker) started or not

To check cluster details on web interface:

Stop all processes:

  • If you want to stop (shut down) all your Hadoop cluster services

Command:

  • stop-all.sh

For any queries you can write in a comment or mail me at: “brijeshbmehta@gmail.com”

Courtesy: Mr. Anand Kumar, NIT, Trichy

Comments please...