Oozie - Framework
Oozie is a workflow/coordination system that you can use to manage Apache Hadoop jobs.
Oozie server a web application that runs in java servlet container(the standard Oozie distribution is using Tomcat).
This server supports reading and executing Workflows. Coordinators and Bundles definitions.
Oozie is a framework which is used to handle to run the Hadoop jobs.
It is same like Autosys,Cron jobs, ContolM Scheduling tools
HPDF – Hadoop process definition language – defining the job details, start node , stop node ,input directory ,output directory etc.. details.
Main features:
Execute and Monitor workflows in Hadoop
Periodic scheduling of workflows
Trigger execution of data availability
HTTP and command line interface and webconsole
Oozie work flow start
Go to installation directory of oozie
Ex: cd /usr/lib/oozie-4.0.0/
: ./bin/oozie-start.sh
Once it’s started then click the below URL to check whether Oozie started or not.
localhost:11000/oozie
In Oozie webconsole, you can see the job information (logs,configuration,etc..)
Oozie Workflow:
PREP: when a workflow job is first created it will be PREP state. Job is defined but not running.
RUNNING: When a CREATED workflow job is started.it goes into RUNNING state, it will remain in RUNNING state while it does not reach it’s end state, ends in error or it is suspended.
SUSPENDED: A RUNNING workflow job can be suspended, it will remain in SUSPENDED state until the workflow job is resumed or it’s killed.
Scheduling with Oozie
Coordinator to Map Reduce -> Launch MR jobs at Regular intervals
Map Reduce to HDFS -> Write Files
Oozie – workflow.xml
The workflow definition language is XML based and it is called HPDL- Hadoop Process Definiton language).
Workflow.xml minimum details we need to mention like name, starting point and ending point.
Ex:
Name=”WorkFlowRunnerTest”>
Flow control nodes: Provides a way to control the workflow execution path
Start node start : This specifies a starting point of an Oozie Workflow
End node end: This specifies an end point for an Oozie Workflow.
To this we need to add action , and within that we will specify the map-reduce parameters.
localhost:8032
hdfs://localhost:9000
mapred.input.dir
{inputDir}
mapred.output.dir
(outputDir)
action requires
and tags to direct the next action on success or failure.
Job.properties file needs to mention details like input dir, output dir.
Job.properties file no need to move to HDFS.
Running a Oozie Application
1.
Create a directory for Oozie Job(WordCountTest)
2.
Write a application and create a jar (ex:Mapreduce jar). Move this jar to lib folder in WordCountTest directory.
3.
Job.properties and workflow.xml inside WordCountTest directory
4.
Move this directory to HDFS
5.
Running the application
oozie job -oozie http://localhost:11000/oozie - config job.properties –run
(job.properties should be from local path)
Workflow Job Status command
oozie job –info job_123
Workflow Job Log
oozie job –log
job_123
Workflow Job definition
oozie job –definition job_123
Oozie version
oozie admin –oozie http://localhost:11000/oozie -version
Oozie Coordinator
The Oozie Coordinator supports the automated starting of Oozie workflow process.
It is typically used for the design and execution of recurring invocations of Workflow processed triggered by time and/or data availability.
hdfs://bar:9000/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}
${current(0)}
hdfs://localhost:9000/WordCountTest_TimeBased
inputData
${data(‘inputLogs’)}
Oozie commands
Checking the multiple workflow jobs
oozie jobs –oozie http://localhost:11000/oozie -localtime -len -filter status=RUNNING
Checking the status of the multiple coordinator jobs
oozie jobs –oozie http://localhost:11000/oozie -jobtype coordinator
Killing a workflow , Coordinator or Bundle Job
oozie job –oozie http://localhost:11000/oozie -kill
Checking the Status of a workflow , Coordinator or Bundle Job or a Coordinator Action
oozie job –oozie http://localhost:11000/oozie –info
Hope this will guide you how to work with Oozie framework.