Skip to content

Hadoop 1.2.1 Installation and Configuration on Multiple Nodes

05/03/2017

There are few changes one has to make for multi-node setup from single node. First you need to complete single node setup up to DFS formatting step.

Mainly there are five steps to follow for multi-node setup from single node:

STEPS:

  1. SSH COPY ID to all nodes
  2. Configure masters and slaves
  3. Configure CORE-SITE.XML and MAPRED-SITE.XML
  4. Format DFS
  5. START-ALL.SH

Now I am going to explain this steps in detail:

Step-1 SSH COPY ID to all nodes:

From NAME NODE, We need to generate SSH KEY and distribute it to all the SLAVE NODES and also SECONDARY NAME NODE (if any)

Command:

ssh-copy-id -i $HOME/.ssh/id_rsa.pub hadoop@coed159

here “hadoop” is an user name and “coed159” is a system name, which you need to change according to your setup.

COPY FINGERPRINT : GIVE YES

Do the same for all DATA NODES and for SECONDARY NAME NODE (if any)

Check whether it is successfully copied or not

ssh coed159

it should not ask for password

Step-2 Configure masters and slaves:

We need to do it on NAME NODE alone (not on the DATA NODES and SECONDARY NAME NODE)

Go to NAME NODE

Command:

cd /usr/local/hadoop/conf

Find the two files: masters, and slaves

Masters for NAME NODE and SECONDARY NAME NODE

Slaves for DATA NODES

Command:

sudo nano /usr/local/hadoop/conf/masters

By default it contains ‘localhost’, Change it to the name of NAME NODE (i.e. coed161 in my case)

Ctrl + o to save

Enter

Ctrl + x to exit

sudo nano /usr/local/hadoop/conf/slaves

By default it contains ‘localhost’, Change it to contain names of all DATA NODES one per line, in my case

coed159

coed160

coed162

coed163

Ctrl + o to save

Enter

Ctrl + x to exit

Step-3 Configure CORE-SITE.XML and MAPRED-SITE.XML

go to SLAVES/SECONDARY NAME NODE and we need to make them point to the master

Command:

sudo nano /usr/local/hadoop/conf/core-site.xml

Check whether it is pointing to NAME NODE (i.e. coed161 in my case) in ‘FS.DEFAULT.NAME’, if it is pointing to localhost:10001, update localhost with coed161

Ctrl + o to save

Enter

Ctrl + x to exit

Same way for MAPRED-SITE.XML

Command:

sudo nano /usr/local/hadoop/conf/mapred-site.xml

Check whether it is pointing to the JOB TRACKER / NAME NODE (i.e. coed161, in my case)

If it is ‘localhost:10002’, update it as ‘coed161:10002’

Remove LOCAL HOST entries from /ETC/HOSTS file

Command:

sudo nano /etc/hosts

remove localhost and entries for 127.0.0.1

Step-4 Format DFS:

If converting the existing single node installation then you must delete the /USR/LOCAL/HADOOP/TMP and then create it again in all the nodes and then format it from NAME NODE alone. skip up to formatting steps if you haven’t formatted your HDFS with single node setup.

Command:

To remove directory:

sudo rm -r /usr/local/hadoop/tmp

Create tmp directory

sudo mkdir /usr/local/hadoop/tmp

Changing ownership of tmp as well as hadoop directory

sudo chown hadoop /usr/local/hadoop/tmp

sudo chown hadoop /usr/local/hadoop

Format NAME NODE

hadoop namenode -format

Check for ‘name node successfully formatted’ message

Step-5 START-ALL.SH

To start hadoop cluster with multi-node, we have to run this command from NAME NODE and it starts respective services on all NODES

Command:

start-all.sh

jps

check each system separately to find specific JVMs running on them

Check number of live nodes in web GUI (it will take few minuets)

stop-all.sh

For any queries you can write in a comment or mail me at: “brijeshbmehta@gmail.com”

Courtesy: Mr. Anand Kumar, NIT, Trichy

Comments please...