Step 1: Download Spark DownLoad Spark From Here
1.Choose Spark Release < which ever version you wan to be work>
2.Choose Packe Type < Any version of hadoop>
3.Choose Download type
4.Click on the Download Spark.
Step 2: After successful Download, we need to run the spark.
For that , we need to follow few steps.
1.Install Java 7 and set the PATH and JAVA_HOME in environment variables.
2.Download Hadoop version< Here I have downloaded hadoop 2.4>
3. Untar the tar file and set the HADOOP_HOME and update the PATH in environemnt varaibles.
4.If Hadoop not installed then download the winutils.exe file and save in your local system.
(This is to work with windows environment)
5.After downloading set the HADOOP_HOME in environment variables where our winutils.exe file resides.
Step 3: Once everything has been done, then now we need to check spark has been working or not.
1.Go to command Prompt
C:/>spark-shell
spark will start and with lot of logs, to avoid info logs we need to change the log level.
Step 4: Go to conf inside spark
1.Copy log4j.properties.template and paste in same location and edit the same.
2.Change the INFO level to ERROR level and rename it has log4j.properties
log4j.rootCategory=INFO, console change as
log4j.rootCategory=ERROR, console
Step 5: After changing the Log level, if we try to run spark-shell, again from command prompt, then you can see the difference.
1.This is How we can install the Spark in windows environment.
2.If you are facing any issues while starting the spark.
3.First check the Hadoop home path by using the following command
C:> echo %HADOOP_HOME%4.It should print Hadoop home path where our winutils.exe file is available
5.Set the permissions for the hadoop temp folder, provide the permissions
C:> %HADOOP_HOME%\bin\winutils.exe ls \tmp\hive C:> %HADOOP_HOME%\bin\winutils.exe chmod 777 \tmp\hive
Step 6: Now we will check word count example using spark.
How we usally do in Hadoop Map reduce to count the words in the given file.
1.After spark-shell started we will get 2 contexts, one is Spark Context (sc), SQL Context as sqlContext.
2.Using the spark context sc, we will read the files, and do the manipulation and write output to the file.
val textFile = sc.textFile(“file:///C:/spark/spark-1.5.0-bin-hadoop2.4/README.md”) //to read the first line of the file textFile.first //Split the each line data using space as delimeter val tokenizedFileData = textFile.flatMap(line=>line.split(“ “)) //Prepare Counts using map val countPrep = tokenizedFileData.map(word=>(word,1)) //Check the counts using reduceByKey val counts = countPrep.reduceByKey((accumValue,newValue)=>accumValue+newValue) //sort the values using key value pair val sortedCounts = counts.sortBy(kvPair=>kvPair._2,false) //Save the sorted counts into outfile calles ReadMeWordCount sortedCounts.saveAsTextFile(file:///C:/spark/ReadMeWordCount) //If we want to show countByValue(built in mapreduce) tokenizedFileData.countByValue
Step 7: Few more commands to save the output file into local system
Step 8: Output file will be stored as parts as mentioned below
Thank you very much for viewing this post.
nice blog..helpfull
ReplyDeleteI went over this website and I believe you have a lot of wonderful information, saved to my bookmarks free word counter
ReplyDelete