Hadoop 1.2.1 Installation and Configuration on Single Node
I have experienced difficulties in installing and configuring Hadoop so I want to make one easy guide for installation and configuration of Hadoop 1x. I am assuming that readers have a knowledge of basic Linux commands so i am not going to explain those commands in deep.
I have used Hadoop-1.2.1, jdk-7 and Ubuntu(Linux) in our setup.
Install SSH:
- We require SSH for remote login to different machines for Map Reduce task to run on Hadoop cluster
Commands:
- sudo apt-get update
- updates list of packages
- sudo apt-get install openssh-server
- Installs OpenSSH Server
Generate Keys:
- Hadoop logged in to remote machines many times while running a Map Reduce task. Therefore, we need to make password less entry for Hadoop to all the nodes in our cluster.
Commands:
- ssh
- in write your system’s host name. It asks for password
- ssh-keygen
- generates SSH Keys
- ENTER FILE NAME:
- no need to write anything simply press enter as we want to use default settings
- ENTER PASSPHRASE:
- no need to write anything simply press enter as we want to use default settings
- cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
- Copy id_rsa.pub key to authorized keys to make password less entry for user
- ssh
- Now it should not ask for password
Install Java
- I prefer offline Java installation. So I have already downloaded Java tar ball and place it in my Downloads directory
Commands:
- sudo mkdir -p /usr/lib/jvm/
- create directory for Java
- sudo tar xvf ~/Downloads/jdk-7u67-linux-x64.tar.gz -C /usr/lib/jvm
- extract and copy content of Java tar ball to Java directory
- cd /usr/lib/jvm
- go to Java directory
- sudo ln -s jdk1.7.0_67 java-1.7.0-sun-amd64
- generate symbolic link to jdk directory which will be used in Hadoop configuration
- sudo update-alternatives –config java
- checking and setting Java alternatives
- sudo nano $HOME/.bashrc
- setting Java path. Add following two lines at the end of this file
- export JAVA_HOME=”/usr/lib/jvm/jdk1.7.0_67″
- export PATH=”$PATH:$JAVA_HOME/bin”
- setting Java path. Add following two lines at the end of this file
- exec bash
- restarts bash(terminal)
- java
- it should not show command not found error!
Install Hadoop:
- First we need to download required tar ball of Hadoop and place it in to home directory.
Commands:
- sudo mkdir -p /usr/local/hadoop/
- create Hadoop directory
- sudo tar xvf ~/hadoop-1.2.1-bin.tar.gz
- sudo cp -r ~/hadoop-1.2.1/* /usr/local/hadoop
- extract and copy Hadoop files from tar ball to Hadoop directory
- sudo nano $HOME/.bashrc
- setting Hadoop path. Add following lines at the end of this file
- export HADOOP_PREFIX=/usr/local/hadoop
- export PATH=$PATH:$HADOOP_PREFIX/bin
- setting Hadoop path. Add following lines at the end of this file
- exec bash
- restarts bash(terminal)
- hadoop
- it should not show command not found error!
Configuration of Hadoop
- We are setting some environment variables and changing some configuration file according to our cluster setup.
Commands:
- cd /usr/local/hadoop/conf
- go to configuration directory of Hadoop
- sudo nano hadoop-env.sh
- open environment variable file and add following two lines at its respective place in file. They are already available in file with some different values so keep it as it is and add this lines after it respectively
- export JAVA_HOME=/usr/lib/jvm/java-1.7.0-sun-amd64
- export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
- open environment variable file and add following two lines at its respective place in file. They are already available in file with some different values so keep it as it is and add this lines after it respectively
- sudo nano core-site.xml
- open HDFS configuration file and set name server address and tmp dir value. you have to use your hostname instead of “coed161″fs.default.name
hdfs://coed161:10001
hadoop.tmp.dir
/usr/local/hadoop/tmp
- open HDFS configuration file and set name server address and tmp dir value. you have to use your hostname instead of “coed161″fs.default.name
- sudo nano mapred-site.xml
- Open Map Reduce configuration file and set job tracker value. you have to write your host name instead of “coed161″mapred.job.tracker
coed161:10002
- Open Map Reduce configuration file and set job tracker value. you have to write your host name instead of “coed161″mapred.job.tracker
- sudo mkdir /usr/local/hadoop/tmp
- create tmp directory to store all files on data node
- sudo chown /usr/local/hadoop/tmp
- change owner of the directory to avoid access control issues. write your username instead of
- sudo chown /usr/local/hadoop
- change owner of the directory to avoid access control issues. write your username instead of
Format DFS (skip this step if you are going for Multi-Node setup):
- Now we are ready to format our Distributed File System (DFS)
Command:
- hadoop namenode -format
- Check for the message “namenode successfully formatted”
Start all process:
- We are ready to start our hadoop cluster (though only single node)
Commands:
- start-all.sh
- to start all (name node, secondary name node, data node, job tracker, task tracker)
- jps
- to check whether all services (i.e. name node, secondary name node, data node, job tracker, task tracker) started or not
To check cluster details on web interface:
- Open any browser and got following address:
- http://coed161:50070/dfshealth.jsp
- for DFS (name node) details. Write your host name instead of “coed161”
- http://coed161:50030/jobtracker.jsp
- for Map Reduce (job tracker) details. Write your host name instead of “coed161”
- http://coed161:50070/dfshealth.jsp
Stop all processes:
- If you want to stop (shut down) all your Hadoop cluster services
Command:
- stop-all.sh
For any queries you can write in a comment or mail me at: “brijeshbmehta@gmail.com”
Courtesy: Mr. Anand Kumar, NIT, Trichy
Trackbacks & Pingbacks