Posts

Showing posts from 2016

Quick light-weight sandbox environment with Apache Bigtop

I'm a long-time user of Apache Bigtop. My experience with Hadoop and Bigtop predates Ambari. I started using Bigtop with version 0.3. I remember pulling bigtop.repo file and install Hadoop, Pig and Hive for some quick development. Bigtop makes it convenient and easy. Bigtop has matured since then and there are now multiple ways of deployment. There's still a way to pull repo and install manually but there's better ways now with Vagrant and Docker. I won't rehash how to deploy Bigtop using Docker as it was beautifly described here . Admittedly, I'm running it on Mac and was not able to provision a cluster using Docker. I did not try with non-OSX. This post is about Vagrant. Let's get started: Install VirtualBox and Vagrant Download 1.1.0 release wget http://www.apache.org/dist/bigtop/bigtop-1.1.0/bigtop-1.1.0-project.tar.gz uncompress the tarball tar -xvzf bigtop-1.1.0-project.tar.gz change directory to bigtop-1.1.0/bigtop-deploy/vm/vagrant-puppet-vm

Executing Python and Python3 scripts in Oozie workflows

it is not completely obvious but you can certainly run Python scripts within Here’s a sample job.properties file, nothing special about it. nameNode=hdfs://sandbox .hortonworks .com : 8020 jobTracker=sandbox .hortonworks .com : 8050 queueName=defaultexamplesRoot=oozie oozie .wf .application .path =${nameNode}/user/${user .name }/${examplesRoot}/apps/python Here’s a sample workflow that will look for a script called script.py inside scripts folder < workflow-app xmlns = "uri:oozie:workflow:0.4" name = "python-wf" > < start to = "python-node" /> < action name = "python-node" > < shell xmlns = "uri:oozie:shell-action:0.2" > < job-tracker > ${jobTracker} </ job-tracker > < name-node > ${nameNode} </ name-node > < configuration > < property > < name > mapred.j

Apache Hive Groovy UDF examples

One of many reasons to be part of a vibrant and rich open source community is access to a treasure trove of information. One evening, I was reading through the Hive user mailing list and noticed how one user was suggesting to write Groovy to parse JSON. It was strange to suggest that approach when there are at least three ways to do so in Hive and they're built-in! It was astonishing because this feature is not very well documented. I decided to dig into it and wrote a couple of examples myself, the last two examples are contributed by Gopal from Hortonworks on that same mailing list. Now for the main event: Groovy UDF example Can be compiled at run time Currently only works in "hive" shell, does not work in beeline su guest hive paste the following code into the hive shell this will use Groovy String replace function to replace all instances of lower case 'e' with 'E' compile `import org.apache.hadoop.hive.ql.exec.UDF \; import org.apache.h