Wednesday, October 28, 2015

running HCatalog commands within Pig scripts in Oozie


Apache Pig introduced a new option of executing HCatalog commands inside Pig scripts and Grunt shell. For details to get started with it, take a look at my previous post.


When you try to execute a pig script with HCatalog command in Oozie, you will actually get the same error, irrelevant whether you changed pig.properties file as described in the previous article. The error is below:


ERROR 2997: Encountered IOException. hcat.bin is not defined. Define it to be your hcat script (Usually $HCAT_HOME/bin/hcat

java.io.IOException: hcat.bin is not defined. Define it to be your hcat script (Usually $HCAT_HOME/bin/hcat
at org.apache.pig.tools.grunt.GruntParser.processSQLCommand(GruntParser.java:1283)
at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
at org.apache.pig.Main.run(Main.java:502)
at org.apache.pig.PigRunner.run(PigRunner.java:49)
at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:288)
at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:231)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:47)
at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:236)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
================================================================================
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]

To make this work with Oozie, you need to do a couple of things:
make sure these properties exist in job.properties


oozie.use.system.libpath=true
hcatNode=thrift://sandbox.hortonworks.com:9083
db=default
table=sample08


Then in the beginning of your script, add the following:
set hcat.bin /usr/bin/hcat;


In your workflow.xml, specify the script and add file property pointing to lib/hive-site.xml





then create a directory called lib and place your hive-site.xml file in it.
your workflow directory tree should look similar to this, README is optional



I have a sample workflow on my github.

Post a Comment