Posts

GOgroove BlueVIBE DLX Hi-Def Bluetooth Headphones Review

This post is all about gadget envy. I just picked up a pair of bluetooth headphones following a Lifehacker deals post a few days ago. Here's the link to the post. I am by no means an audiophile, however, I was against getting bluetooth headphones for a long while. Recently, it dawned on me that BT headset is the only way to go for me when for the nth time my headphones stopped working in one ear. This deal is great and you may still have time to pick these up. Originally they're $80 and after a $50 off coupon provided at the link, these headphones are a steal for $30. They use the old 2.1 A2DP protocol but so far I haven't had any glaring issues with performance. Off the bat, the fit is very comfortable, something I've had issues with before with over the ear headphones. These do not snug tightly and don't make me feel like my head is in a vice. They come with a nice hard case, included is a 3.5mm cable and use the headphones just like any other wired set or use BT...

Gotchas discovered going from Hadoop 2.2 to Hadoop 2.4.

I was trying to execute some run-of-the-mill Mapreduce against HBase on a development cluster running HDP 2.1 and discovered some gotchas going from HDP 2.0 to HDP 2.1. For the unaware, HDP 2.0 stack is Hadoop 2.2.0 and HBase 0.96.0 and HDP 2.1 is Hadoop 2.4.0 and HBase 0.98.0. The stack I used was 2.1.3.0 and then desperately upgraded to the brand-new 2.1.4.0 stack with YARN bug fixes. Spoiler, that did not solve the actual problem. On a side-note, there's a lot of "coincidental" advice out-there and I was not able to fix my issues on following this link http://www.srccodes.com/p/article/46/noclassdeffounderror-org-apache-hadoop-service-compositeservice-shell-exitcodeexception-classnotfoundexception. This article did however put me on the right path, I went to the nodemanager where the failing task attempt executed and looked up the "real" error. The problem was class not found for one library that was moved from hadoop-common Maven artifact to hadoop-yarn-comm...

Fix for infamous Oozie error "Error: E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate"

Image
I just had a breakthrough moment when I realized why this error shows up when you run an Oozie workflow. We use Ambari for cluster management and by default Ambari has core-site.xml configured with these properties: The issue lies in the oozie.groups property. You need to make sure a user executing a workflow, must belong to "users" Linux group on the namenode server. Failsafe is definitely to have an asterisk for either property as most people recommend but I think this is a more granular approach. This idea dawned on me when I saw in the Ambari admin tab, the following: This means exactly that, user executing the workflow needs to belong to the proxy group controlled by hadoop.proxyuser.oozie.groups property. 

Work-around for Isilon HDFS and Hadoop 2.2+ hdfs client incompatibility

If you're running Isilon NAS and use it's Map/Reduce functionality for your workloads, you're probably still using Hadoop 1.x. If you're thinking of moving to Hadoop 2 and you have a secondary standalone cluster running Hadoop 2.2+ and you want to move data back and forth using utilities like distcp, I have bad news for you. Isilon does not support Hadoop 2.2. Sometime in the third quarter, they will release OneFS compatible with Hadoop 2.3+. The problem is with the underlying protobuf version incompatibility. There are some work-arounds available like doing distcp with webhdfs going from Isilon to standalone cluster but it doesn't work the other way around, at least I couldn't get it to work. On top of that, I'd lose packets during distcp via webhdfs and jobs would fail due to mismatched checksums. Great, so what is the solution, well you can also distcp using hftp protocol, it's a client independent protocol specifically built for incompatible hdfs cli...

Book review: Securing Hadoop

Everyone is talking about security nowadays when it comes to Hadoop and it's ecosystem. Judging by the last two major acquisitions from Hortonworks and Cloudera , the major players are not taking it lightly either. I've been weary of security implications of maintaining an insecure Hadoop as well. Most Hadoop books dedicate a chapter or two on Hadoop security and up until now there were no books dedicated solely to Hadoop security. Choices were slim.. Enter Securing Hadoop . This book is only 120 pages and I was able to read it cover to cover on my commute to work in about a week. I will not provide a chapter by chapter summary of what this book offers, the book has a one page description of each chapter which describes everything better than I ever could. What I will say in this review is what this book does best and what it can improve on in the next iteration. This book is a "good to have" but not a "must have", unfortunately. The book does a good job at...

(Update) Maven tip for building "fat" jars and making them slim

Image
The other day I was working on MapReduce code over HBase tables and I discovered something really cool. Usually I'd have to package all HBase, Zookeeper, etc libraries else I'd get a ClassNotFoundException. I found this tip in HBase: Definitive Guide book. Apparently, if you specify scope "provided" in your Maven pom.xml file, Maven will not package the jars but it will expect that the jars are available on the cluster's classpath. I will save you my poor interpretation of this feature and point you to the Maven documentation. The feature is called Dependency Scope . This is how I define my dependencies now: So just to give you an idea, my jar size before adding this tag was 44Mb and after, it was 11Kb. Definitely saves time on transmitting the jars back and forth. Granted, this may not be a new tip to most people, I actually have seen this feature used when I was playing with Apache Storm, specifically the  storm-starter project but it never occurred to...

Book review: Apache Hadoop YARN

I've been looking for a comprehensive book on Apache Hadoop 2 and Yarn architecture, there are a few MEAPs available. This book in particular was finally released a few months back with all complete chapters. As all Hortonworks documentation, this book is well written and very easy to read. The choice to choose this book over others was simple. On top of that, it's written by the Hadoop committers so it's basically from the "horse's mouth". The current edition of the book has 12 chapters with additional material. The first chapter goes into history of how Hadoop came about and challenges the team at Yahoo had faced early in Hadoop history. This chapter opened up my eyes on how grandiose the project architecture was in the past and what it's become. It is very easy to take things for granted and this chapter does a great job explaining the choices the team had made. Chapter 2 gives a quick intro on how to deploy a single node cluster and start playing with ...