How to install Hadoop cluster(3 node cluster) on Vmware Player.

By Tzu-Cheng Chuang 4-25-2014

Purpose: Easily setting up a Hadoop environment for testing and evaluation

This demonstration has been tested with the following software versions: CentOS 6.5 x86_64, Ambari 1.5.1, Vmware Player 6.0.2 (Ambari WebUI will be used to install HDFS, YARN+MapReduce2, Pig, Hive, Hbase, Oozie, Zookeeper ...etc.)

Steps:

1. Download CentOS-6.5-x86_64-bin-DVD1.iso Image from CentOS website.

2. Install CentOS 6.5 with Virtual machine name: hadoop-1, user name: user1 , password: password, disk size: 20GB, memory: 1048MB, 1 processor.

3. Do the same installation for creating hadoop-2, and hadoop-3 Guest OS

4. (Optional)Install "Development Tools" (such as GNU C, C++ compiler) on all 3 hosts
    [root@localhost jasontgi]# yum groupinstall "Development Tools"
    

5. Change hostname, domain name on all 3 hosts On each host, run this command with corresponding hostname (hadoop-1, hadoop-2, hadoop-3) Set hostname manually without rebooting the box
    [root@localhost jasontgi]# hostname hadoop-1.chuangtc.com
    
On each most, modify /etc/sysconfig/network file, enter
    [root@localhost jasontgi]# vim  /etc/sysconfig/network
    
Modify HOSTNAME value to each corresponding value
    HOSTNAME="hadoop-1.chuangtc.com"
    
Save and close the file.

Check ip address
    [root@localhost jasontgi]# ifconfig -a
    
Edit hosts file, modifying /etc/hsots file
You need to set or change the host that is set to your IP address on server.
    127.0.0.1 localhost
    192.168.61.132 hadoop-1.chuangtc.com hadoop-1
    192.168.61.134 hadoop-2.chuangtc.com hadoop-2
    192.168.61.136 hadoop-3.chuangtc.com hadoop-3
    
Restart the CentOS networking and other services (if any)
    [root@localhost jasontgi]# service network restart
    
Log out and log in back to verify network hostname
    [root@hadoop-1 jasontgi]# hostname -f
    hadoop-1.chuangtc.com
    [root@hadoop-1 jasontgi]# dnsdomainname
    chuangtc.com
    
4. Install java-7-jdk
(1)Install java-1.7.0-openjdk-devel on each host
 
    [root@hadoop-1 jasontgi]# yum install java-1.7.0-openjdk-devel.x86_64
    
(4) Set environment variable by modifying ~/.bashrc file, put the following two lines in the end of the file
    export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
    export PATH=$PATH:$JAVA_HOME/bin
    

5. Configure SSH server so that ssh from hadoop-1 to hadoop-2, hadoop-3 doesn’t need a passphrase
(1) Generate RSA pair key on each host
    [root@hadoop-1 ~]# ssh-keygen –t rsa 
    
(2) Enable SSH access to local machine on each host
    [root@hadoop-1 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    
(3) Enable SSH access from hadoop-1 to hadoop-2, hadoop-3 machine on each host
    [root@hadoop-1 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-2
    [root@hadoop-1 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-3
    [root@hadoop-2 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-1
    [root@hadoop-3 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-1
    

6. Disable IPv6 by adding the following to /etc/sysctl.conf file, put the following two lines in the end of the file
    #disable ipv6
    net.ipv6.conf.all.disable_ipv6 = 1
    net.ipv6.conf.default.disable_ipv6 = 1
    
To disable in the running system:
    [root@hadoop-1 ~]#sysctl -w net.ipv6.conf.all.disable_ipv6=1
    [root@hadoop-1 ~]#sysctl -w net.ipv6.conf.default.disable_ipv6=1
    

7. Disable Firewall (It will make Hadoop installation easier) On each host, run the following command
    [root@hadoop-1 ~]# service iptables save
    [root@hadoop-1 ~]# service iptables stop
    [root@hadoop-1 ~]# chkconfig iptables off
    
8. Disable SELinux
    [root@hadoop-1 ~]# setenforce 0
    [root@hadoop-1 ~]# ssh hadoop-2 "setenforce 0"
    [root@hadoop-1 ~]# ssh hadoop-3 "setenforce 0"
    
9. Update Openssl (The dafault openssl in CentOS 6.5 has issues) On each host run the following to update Openssl
    [root@hadoop-1 ~]# yum update openssl
    
10. Enable ntpd on each host
    [root@hadoop-1 ~]# chkconfig ntpd on
    [root@hadoop-1 ~]# ssh hadoop-2 "chkconfig ntpd on"
    [root@hadoop-1 ~]# ssh hadoop-3 "chkconfig ntpd on"
    [root@hadoop-1 ~]# service ntpd restart
    [root@hadoop-1 ~]# ssh hadoop-2 "service ntpd restart"
    [root@hadoop-1 ~]# ssh hadoop-3 "service ntpd restart"
    
11. Install Ambari
(1) Download Ambari repository to hadoop-1 Please refer to https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.5.1+from+Public+Repositories
    [root@hadoop-1 ~]# cd /etc/yum.repos.d/
    [root@hadoop-1 yum.repos.d]# wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.5.1/ambari.repo
    

(2) Install, Setup, and Start Ambari server
    [root@hadoop-1 ~]# yum install ambari-server
    [root@hadoop-1 ~]# ambari-server setup
    [root@hadoop-1 ~]# ambari-server start
    

(3) Deploy Hadoop cluster using Ambari Web UI
Open a web browser on any host, and go to http://ip_address_for_hadoop-1.chuangtc.com:8080
Log in with username admin and password admin and follow on-screen instructions.










For the installation, it takes around 1 hour to 1.5 hours.



12. After the installation, make sure postgres start on boot
    [root@hadoop-1 ~]# chkconfig postgresql on
    [root@hadoop-1 ~]# service postgresql start
    
13. log on to hadoop-1 and list files on HDFS user directory
    [user1@hadoop-1 ~]$ hadoop fs -ls /user