Thursday, December 24, 2015

Pig Dynamic Invoker

I must’ve been living under a rock because I’d just learned about Pig’s dynamic invokers. What if I told you that besides UDFs, you have another option to run your Java code without compiling your UDFs. I will let you read the docs on your own but even though I find it quite handy to use it, it is pretty limited in features. You’re limited to passing primitives only and only static methods work. There’s an example of using a non-static method “StringConcat” but I haven’t been able to make it work.
So for the demo:
suppose you have a file with numbers 4, 9, 16, etc, one on each line 
upload the file to hdfs
hdfs dfs -put numbers /user/guest/
then suppose you’d like to use Java Math’s Sqrt function to get square root of each number, you can of course write use built-in SQRT function but for the example purposes bare with me. The code to make it work with Pig and Java would look like so:
DEFINE Sqrt InvokeForDouble('java.lang.Math.sqrt', 'double');
numbers = load '/user/guest/numbers' using PigStorage() as (num:double);
sqrts = FOREACH numbers GENERATE Sqrt($0);
dump sqrts;
then all that’s left doing is execute the script
pig -x tez scriptname.pig
I think this feature has a lot of promise, especially if it can be opened up to non-primitive types and of course not just static methods. It was introduced in Pig 0.8 and I haven't seen any changes since to extend feature set. It's unfortunate but then again, you can extend Pig's functionality with UDFs. 
Another thing to keep in mind, since this relies on reflection, your Pig scripts will run slower than if you'd write the same functions as UDFs. I do see value in this though, when you're lazy to develop, compile and test your UDFs and need something quick!
as usual, you can find the code at my github repo.
post written with
Post a Comment