Hadoop installation with Single DataNode( VMware or Oracle virtual box)
Download latest version VM ware from the below link
http://www.traffictool.net/vmware/
Download Oracle virtual box from the below site and install the same in local system.
http://www.oracle.com/technetwork/server-storage/virtualbox/downloads/index.html
Run the Virtual box(VirtualBox.exe) Application
click on new ->
And click on Next->Next-And create virtual box
Once that’s done virtual box will look like this. Select the Ubuntu downloaded package.
Start the Virtual box, then provide password from which user you want to start.
Once virtual box started then screeb will look like this
Open the terminal, by right click on the screen or search for terminal and open the same.
Command:to update the ubuntu
1. sudo apt-get update
Once update is complete
Command: install openssh server
2. sudo apt-get install openssh–server
Command: create a hadoop directory
3. mkdir /usr/local/hadoop
Download the hadoop latest version from below link
http://hadoop.apache.org/releases.html
copy to virtual box and extract the tar file
Here I extracted under /usr/local/hadoop/
Command: to extract the tar file
4. tar -xvf
After extracting enter this command ls –lrt , you can see the list of folders related to hadoop
Command: To add hadoop to the group
5. sudo addgroup hadoop
Command: create new user called hduser
6. sudo adduser --ingroup hadoop hduser
Command: assign hduser to sudo
7. sudo adduser hduser sudo
Command: change the owner for hadoop as hduser
8. sudo chown –R hduser:hadoop /usr/local/hadoop
Command: switch to hduser
9. su – hduser
Command: install ssh
10. sudo apt-get install ssh
Command: generate a ssh key
11. ssh-keygen -t rsa –P ""
Command: copy id_rsa.pub key to authorized_keys
12. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Command: install vim editor
13. sudo apt-get install vim
Command: Edit the sysctl.conf file to dispable few of the ipv6 realted configuration
14. sudo gedit /etc/sysctl.conf or sudo vi /etc/sysctl.conf
Add below lines
net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default_ipv6=1 net.ipv6.conf.io.disable_ipv6=1
Command:Start the ssh
15. ssh localhost
Command: get the updates
16. sudo apt-get update
Command: edit the bashrc file to add the path of java and hadoop
17. sudo vi ./bashrc or sudo gedit ./bashrc
export HADOOP_HOME = /usr/local/hadoop export JAVA_HOME=/usr [or] where ever your java installed locationCommand: Source the bashrc file
18. source .bashrc
Command: Now check the version of java and hadoop
19. java –version
20. hadoop version
Command:Create a data directory inside /usr/local/hadoop
21. mkdir /usr/loca/hadoop/data
Command: edit the hadoop_env.sh file to add the configuration
22. sudo gedit /usr/loca/hadoop/etc/hadoop/hadoop_env.sh
export JAVA_HOME=/usr export HADOOP_OPTS=”$HADOOP_OPTS –Djava.net.preferIPv4Stack= true -Djava.library.path=$HADOOP_PREFIX/lib”Command: edit the yarn_env.sh file to add the configuration
23. sudo gedit /usr/loca/hadoop/etc/hadoop/yarn_env.sh
export HADOOP_CONF_LIB_NATIVE_DIR=${HADOOP_PREFIX:-“lib/native”} export HADOOP_OPTS=” Djava.library.path=$HADOOP_PREFIX/lib”Now we need to edit the some of the hadoop related files, to start the single node
Go to /usr/local/hadoop/etc/hadoop$
Command: Edit the existing file and add the below configuration
24. sudo gedit core-site.xml
Command: Rename mapred-site.xml.template to mapred-site.xmlfs.default.name hdfs://localhost:9000 hadoop.tmp.dir /usr/local/hadoop/data
Go to /usr/local/hadoop/etc/hadoop
25. mv mapred-site.xml.template mapred-site.xml
26. sudo gedit mapred-site.xml
mapreduce.framework.name yarn
Then close this file
Edit the hdfs-site.xml,
Command: to edit the hdfs-site.xml
27. sudo gedit hdfs-site.xml
Command:Edit the yarn.xmldfs.replication 3
28. sudo gedit yarn.xml
Command: Need to format the namenodeyarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.resource-tracker.address localhost:8025 yarn.resourcemanager.scheduler.address localhost:8030 yarn.resourcemanager.address localhost:8050
29. /usr/local/hadoop/bin/hadoop namenode –format
After this format done then we need to start the dfs and yarn
30. /usr/local/hadoop/sbin/start-dfs.sh
31. /usr/local/hadoop/sbin/start-yan.sh
Command: to display all the running datanodes and namemodes
32. jps
This is how we can setup the hadoop using oracle/vmware virtual box.
Thank you for viewing this post.
Thank you! Siva...I am able to do the set up after following your steps...
ReplyDelete