import org.apache.pig.scripting.groovy.OutputSchemaFunction;
import org.apache.pig.PigWarning;
class GroovyUDFs {
@OutputSchemaFunction('toUpperSchema')
public static allcaps(byte[] b) {
if(b==null || b.size() == 0) {
return null;
}
try {
String s = new String(b, "UTF-8");
return s.toUpperCase();
} catch (ClassCastException e) {
warn("casting error", PigWarning.UDF_WARNING_1);
} catch ( Exception e) {
warn("Error", PigWarning.UDF_WARNING_1);
}
return null;
}
public static toUpperSchema(input) {
return input;
}
}
Then in your Pig scriptregister 'udfs.groovy' using org.apache.pig.scripting.groovy.GroovyScriptEngine as myfuncs;
a = load '../input.txt' using PigStorage();
b = foreach a generate myfuncs.allcaps($0);
dump b;
And that's all she wrote. The sample code is on my github page, along with sample Python UDFs. Notice in my Python UDF, I'm casting the input to str() function to take advantage of .upper() method. WARNING, the Python script may need some more error handling and it automatically assumes you're passing a String object. In my Groovy script, I'm checking for most common errors.return str(word).upper()
You can execute the sample pig scripts on Sandbox using tez_local mode.pig -x tez_local example.pig
As usual, please provide feedback, Markdown was generated using Dillinger.
No comments:
Post a Comment