Monday, February 1, 2016

Executing Python and Python3 scripts in Oozie workflows

it is not completely obvious but you can certainly run Python scripts within
Here’s a sample job.properties file, nothing special about it.
nameNode=hdfs://sandbox.hortonworks.com:8020
jobTracker=sandbox.hortonworks.com:8050
queueName=defaultexamplesRoot=oozie
oozie.wf.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/python
Here’s a sample workflow that will look for a script called script.py inside scripts folder
<workflow-app xmlns="uri:oozie:workflow:0.4" name="python-wf">
    <start to="python-node"/>
    <action name="python-node">
        <shell xmlns="uri:oozie:shell-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>script.py</exec>
        <file>scripts/script.py</file>  
            <capture-output/>
        </shell>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
    <message>Python action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>
here’s my sample script.py
#! /usr/bin/env pythonimport os, pwd, sysprint "who am I? " + pwd.getpwuid(os.getuid())[0]print "this is a Python script" 
print "Python Interpreter Version: " + sys.version
directory tree for my workflow assuming the workflow directory is called python is as such
[root@sandbox python]# tree
.
├── job.properties
├── scripts
│   └── script.py
└── workflow.xml


1 directory, 3 files
now you can execute the workflow like any other Oozie workflow.
If you wanted to leverage Python3, make sure Python3 is installed on every node. My Python3 script.py looks like this
#! /usr/bin/env /usr/local/bin/python3.3
import os, pwd, sys
print("who am I? " + pwd.getpwuid(os.getuid())[0])
print("this is a Python script")
print("Python Interpreter Version: " + sys.version)
Post a Comment