Posts

Showing posts from November, 2015

Apache Pig Groovy UDF end to end examples

Apache Pig 0.11 added ability to write Pig UDFs in Groovy. The other possible languages to write Pig UDFs are Python, Ruby, Jython, Java, JavaScript. There are a lot of examples for UDFs in Python but the documentation does not give enough for beginners to get started with Groovy. I found the process of writing a Groovy UDF a lot more complicated than Python for example. First misconception is that you don't need to include Groovy groovy-all.jar in pig libraries, Pig is shipped with Groovy by default. Furthermore, you don't need to install Groovy on the client or any other machine, for the same reason as before. The other issue I was having and it was that I was getting type mismatch errors. The tuples arrive as byte arrays, at least with PigStorage loader function and before applying your custom logic, you need to cast the input to the appropriate class. import org.apache.pig.scripting.groovy.OutputSchemaFunction; import org.apache.pig.PigWarning; class GroovyUDFs { ...

Using mongo-hadoop connector to interact with MongoDB using Pig, Hive and Spark (Update)

I published a set of Pig, Hive and Spark scripts to interact with MongoDB using mongo-hadoop connector. Some of the published tutorials on Mongo and Hadoop on Databricks and MongoDB sites are no longer working, I decided to update them for HDP 2.3. Some things are still wonky, like Hive queries failing if you try to run anything other than select. Either way, give it a try and provide feedback. One more thing, I'm using Sandbox with HDP 2.3.2 and mongo is installed as an Ambari service using tutorial from github user nikunjness , made my work so much easier. The code is published on my github page as well as on Hortonworks Community Site. Thanks and enjoy. Sample tutorial on HDP integration with MongoDB using Ambari, Spark, Hive and Pig Prerequisites HDP 2.3.2 Sandbox Mongo 2.6.11 install MongoDB service as per  https://github.com/nikunjness/mongo-ambari IMPORTANT make sure you change directory to home after completing the mongo-ambari service insta...