How to install Hadoop cluster(3 node cluster) on Vmware Player.
By Tzu-Cheng Chuang 4-25-2014
Purpose: Easily setting up a Hadoop environment for testing and evaluation
This demonstration has been tested with the following software versions:
CentOS 6.5 x86_64, Ambari 1.5.1, Vmware Player 6.0.2 (Ambari WebUI will be used to install HDFS, YARN+MapReduce2, Pig, Hive, Hbase, Oozie, Zookeeper ...etc.)
Steps:
1. Download CentOS-6.5-x86_64-bin-DVD1.iso Image from
CentOS website.
2. Install CentOS 6.5 with Virtual machine name: hadoop-1, user name: user1 , password: password, disk size: 20GB, memory: 1048MB, 1 processor.
3. Do the same installation for creating hadoop-2, and hadoop-3 Guest OS
4. (Optional)Install "Development Tools" (such as GNU C, C++ compiler) on all 3 hosts
[root@localhost jasontgi]# yum groupinstall "Development Tools"
5. Change hostname, domain name on all 3 hosts
On each host, run this command with corresponding hostname (hadoop-1, hadoop-2, hadoop-3) Set hostname manually without rebooting the box
[root@localhost jasontgi]# hostname hadoop-1.chuangtc.com
On each most, modify /etc/sysconfig/network file, enter
[root@localhost jasontgi]# vim /etc/sysconfig/network
Modify HOSTNAME value to each corresponding value
HOSTNAME="hadoop-1.chuangtc.com"
Save and close the file.
Check ip address
[root@localhost jasontgi]# ifconfig -a
Edit hosts file, modifying /etc/hsots file
You need to set or change the host that is set to your IP address on server.
127.0.0.1 localhost
192.168.61.132 hadoop-1.chuangtc.com hadoop-1
192.168.61.134 hadoop-2.chuangtc.com hadoop-2
192.168.61.136 hadoop-3.chuangtc.com hadoop-3
Restart the CentOS networking and other services (if any)
[root@localhost jasontgi]# service network restart
Log out and log in back to verify network hostname
[root@hadoop-1 jasontgi]# hostname -f
hadoop-1.chuangtc.com
[root@hadoop-1 jasontgi]# dnsdomainname
chuangtc.com
4. Install java-7-jdk
(1)Install java-1.7.0-openjdk-devel on each host
[root@hadoop-1 jasontgi]# yum install java-1.7.0-openjdk-devel.x86_64
(4) Set environment variable by modifying ~/.bashrc file, put the following two lines in the end of the file
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64
export PATH=$PATH:$JAVA_HOME/bin
5. Configure SSH server so that ssh from hadoop-1 to hadoop-2, hadoop-3 doesn’t need a passphrase
(1) Generate RSA pair key on each host
[root@hadoop-1 ~]# ssh-keygen –t rsa
(2) Enable SSH access to local machine on each host
[root@hadoop-1 ~]# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(3) Enable SSH access from hadoop-1 to hadoop-2, hadoop-3 machine on each host
[root@hadoop-1 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-2
[root@hadoop-1 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-3
[root@hadoop-2 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-1
[root@hadoop-3 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-1
6. Disable IPv6 by adding the following to /etc/sysctl.conf file, put the following two lines in the end of the file
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
To disable in the running system:
[root@hadoop-1 ~]#sysctl -w net.ipv6.conf.all.disable_ipv6=1
[root@hadoop-1 ~]#sysctl -w net.ipv6.conf.default.disable_ipv6=1
7. Disable Firewall (It will make Hadoop installation easier)
On each host, run the following command
[root@hadoop-1 ~]# service iptables save
[root@hadoop-1 ~]# service iptables stop
[root@hadoop-1 ~]# chkconfig iptables off
8. Disable SELinux
[root@hadoop-1 ~]# setenforce 0
[root@hadoop-1 ~]# ssh hadoop-2 "setenforce 0"
[root@hadoop-1 ~]# ssh hadoop-3 "setenforce 0"
9. Update Openssl (The dafault openssl in CentOS 6.5 has issues)
On each host run the following to update Openssl
[root@hadoop-1 ~]# yum update openssl
10. Enable ntpd on each host
[root@hadoop-1 ~]# chkconfig ntpd on
[root@hadoop-1 ~]# ssh hadoop-2 "chkconfig ntpd on"
[root@hadoop-1 ~]# ssh hadoop-3 "chkconfig ntpd on"
[root@hadoop-1 ~]# service ntpd restart
[root@hadoop-1 ~]# ssh hadoop-2 "service ntpd restart"
[root@hadoop-1 ~]# ssh hadoop-3 "service ntpd restart"
11. Install Ambari
(1) Download Ambari repository to hadoop-1
Please refer to
https://cwiki.apache.org/confluence/display/AMBARI/Install+Ambari+1.5.1+from+Public+Repositories
[root@hadoop-1 ~]# cd /etc/yum.repos.d/
[root@hadoop-1 yum.repos.d]# wget http://public-repo-1.hortonworks.com/ambari/centos6/1.x/updates/1.5.1/ambari.repo
(2) Install, Setup, and Start Ambari server
[root@hadoop-1 ~]# yum install ambari-server
[root@hadoop-1 ~]# ambari-server setup
[root@hadoop-1 ~]# ambari-server start
(3) Deploy Hadoop cluster using Ambari Web UI
Open a web browser on any host, and go to http://ip_address_for_hadoop-1.chuangtc.com:8080
Log in with username admin and password admin and follow on-screen instructions.
For the installation, it takes around 1 hour to 1.5 hours.
12. After the installation, make sure postgres start on boot
[root@hadoop-1 ~]# chkconfig postgresql on
[root@hadoop-1 ~]# service postgresql start
13. log on to hadoop-1 and list files on HDFS user directory
[user1@hadoop-1 ~]$ hadoop fs -ls /user