Thursday, October 30, 2014

MobaXterm release 7.3 is out with Shellshock bugfix

New release of my favorite terminal emulator MobaXterm is out with much needed fixes for latest security vulnerabilities as well as some new improvements like Cygwin repository support. I litteraly just found out so don't expect an in-depth overview, just go get it. If you want to read the release notes and download the latest version, go here.

Sunday, October 26, 2014

Where did all my images go?

I was just browsing through my past blog posts and noticed that all most recent blog entries were missing images. Did Google have an "oops" moment? I will try to locate the images and repost but most are unfortunately lost. Lesson learned: not to use Blogger as my documentation repository... Bummer!

Tuesday, August 26, 2014

GOgroove BlueVIBE DLX Hi-Def Bluetooth Headphones Review

This post is all about gadget envy. I just picked up a pair of bluetooth headphones following a Lifehacker deals post a few days ago. Here's the link to the post. I am by no means an audiophile, however, I was against getting bluetooth headphones for a long while. Recently, it dawned on me that BT headset is the only way to go for me when for the nth time my headphones stopped working in one ear. This deal is great and you may still have time to pick these up. Originally they're $80 and after a $50 off coupon provided at the link, these headphones are a steal for $30. They use the old 2.1 A2DP protocol but so far I haven't had any glaring issues with performance. Off the bat, the fit is very comfortable, something I've had issues with before with over the ear headphones. These do not snug tightly and don't make me feel like my head is in a vice. They come with a nice hard case, included is a 3.5mm cable and use the headphones just like any other wired set or use BT, the set also includes a USB charging cable, which is very convenient to have. I give the manufacturer two thumbs up for making the charger standard and I can charge my phone using the same cable. The headphones fold easily, have accessible buttons for volume, skip, stop, play. What sealed the deal for me was the fact that they also have a built-in mic. I barely every need to take my phone out of the pocket anymore. These headphones do sound a bit low compared to my wired Monoprice set but for the commute, they're better than any in-ear sets I've owned. The over the ear design offers great muffling of the background noise so having lower sound does not impede performance. Again, I am not an audiophile, these are great for podcasts and such, perhaps for classical music, you may need something else. I am very impressed and for $30 they're an incredible find. I forgot to mention that they also look great, they have an "expensive" look to them and they don't feel cheap. I recommend GOgroove BlueVIBE DLX for some casual listening.

until another time...

Monday, August 25, 2014

Gotchas discovered going from Hadoop 2.2 to Hadoop 2.4.

I was trying to execute some run-of-the-mill Mapreduce against HBase on a development cluster running HDP 2.1 and discovered some gotchas going from HDP 2.0 to HDP 2.1. For the unaware, HDP 2.0 stack is Hadoop 2.2.0 and HBase 0.96.0 and HDP 2.1 is Hadoop 2.4.0 and HBase 0.98.0. The stack I used was 2.1.3.0 and then desperately upgraded to the brand-new 2.1.4.0 stack with YARN bug fixes. Spoiler, that did not solve the actual problem. On a side-note, there's a lot of "coincidental" advice out-there and I was not able to fix my issues on following this link http://www.srccodes.com/p/article/46/noclassdeffounderror-org-apache-hadoop-service-compositeservice-shell-exitcodeexception-classnotfoundexception. This article did however put me on the right path, I went to the nodemanager where the failing task attempt executed and looked up the "real" error. The problem was class not found for one library that was moved from hadoop-common Maven artifact to hadoop-yarn-common. I will update the post with the right error later. Once that's done basic jobs started running. I did encounter another issue and that issue is described in this Jira. I followed the same method as described previously, I went to the nodemanager logs and found the failing method call and found it to be described in the Jira above. So I just compiled my code using 2.4.1 version of Hadoop and lo and behold it executed the job successfully. So in summary, going from Hadoop 2.2 to Hadoop 2.4 in your Mapreduce code, make sure you change artifactId to hadoop-yarn-common from hadoop-common and compile with version 2.4.1.

Good luck.

Friday, June 13, 2014

Fix for infamous Oozie error "Error: E0501 : E0501: Could not perform authorization operation, User: oozie is not allowed to impersonate"

I just had a breakthrough moment when I realized why this error shows up when you run an Oozie workflow. We use Ambari for cluster management and by default Ambari has core-site.xml configured with these properties:

The issue lies in the oozie.groups property. You need to make sure a user executing a workflow, must belong to "users" Linux group on the namenode server. Failsafe is definitely to have an asterisk for either property as most people recommend but I think this is a more granular approach. This idea dawned on me when I saw in the Ambari admin tab, the following:


This means exactly that, user executing the workflow needs to belong to the proxy group controlled by hadoop.proxyuser.oozie.groups property. 




Work-around for Isilon HDFS and Hadoop 2.2+ hdfs client incompatibility

If you're running Isilon NAS and use it's Map/Reduce functionality for your workloads, you're probably still using Hadoop 1.x. If you're thinking of moving to Hadoop 2 and you have a secondary standalone cluster running Hadoop 2.2+ and you want to move data back and forth using utilities like distcp, I have bad news for you. Isilon does not support Hadoop 2.2. Sometime in the third quarter, they will release OneFS compatible with Hadoop 2.3+. The problem is with the underlying protobuf version incompatibility. There are some work-arounds available like doing distcp with webhdfs going from Isilon to standalone cluster but it doesn't work the other way around, at least I couldn't get it to work. On top of that, I'd lose packets during distcp via webhdfs and jobs would fail due to mismatched checksums. Great, so what is the solution, well you can also distcp using hftp protocol, it's a client independent protocol specifically built for incompatible hdfs clients. The source has to be read-only so you can easily move data from standalone to Isilon. That unfortunately is still useless for customers that need to move data the other way, Isilon to standalone. I haven't found hftp option on Isilon, doesn't mean it doesn't exist, I just was not able to find one. The one solution that I found that actually works both ways and without drawbacks is to mount Isilon hdfs share with NFS. Granted, the mounted share will look as if it's a local file system but you can at that point use hdfs command line utilities put/get to move data around from hdfs to local and back. If you use hdfs Java API, I guess you may even have your code write/read in the same step, I didn't try that yet. I wish this was published, I can't say Isilon hdfs documentation is in-depth, this was kind of a "duh" moment when this idea came to me. It was not really obvious. I hope you find this trick useful.

Friday, June 6, 2014

Book review: Securing Hadoop

Everyone is talking about security nowadays when it comes to Hadoop and it's ecosystem. Judging by the last two major acquisitions from Hortonworks and Cloudera, the major players are not taking it lightly either. I've been weary of security implications of maintaining an insecure Hadoop as well. Most Hadoop books dedicate a chapter or two on Hadoop security and up until now there were no books dedicated solely to Hadoop security. Choices were slim.. Enter Securing Hadoop. This book is only 120 pages and I was able to read it cover to cover on my commute to work in about a week. I will not provide a chapter by chapter summary of what this book offers, the book has a one page description of each chapter which describes everything better than I ever could. What I will say in this review is what this book does best and what it can improve on in the next iteration.
This book is a "good to have" but not a "must have", unfortunately. The book does a good job at whetting my appetite but it doesn't provide a full course meal. The table of contents can easily set your expectations too high but in my opinion it doesn't deliver on everything this book could have been. Of course, with anything, one needs to do their homework and practice on their own and use information in this book as a guide. I guess what this book is good at, is it gives one a starting point and it certainly has a lot to offer in that department, what it lacks is in examples. There are sample configurations sprinkled all over the book but I don't think they're enough to truly grasp the topic. After reading this book I still approach Hadoop security as "black art". So far my review seems negative but it's not by any means intended to be. I really enjoyed it, I just wanted "more" from it. Overall, I am very grateful to the author and publisher for writing a comprehensive reference material. I urge them to publish the next iteration as soon as possible, especially covering XA Secure acquisition from Hortonworks mentioned earlier. This book does however, cover Gazzang solution to block level encryption which is now part of Cloudera. It does cover Apache Knox, project Rhino and some other solutions I've never heard of, it does cover security for HBase, Hive, Hue and Oozie, which I haven't seen in any other books so far. For that, I am very grateful.
One funny anecdote from the book is when it covers the Intel's distribution for Hadoop, which at the time was not in partnership with Cloudera yet. The book states that Intel's Hadoop distribution leverages OpenSSL for data encryption and version of OpenSSL is 1.0.1C, which at this point is found to be vulnerable to Heartbleed bug. Whether it is relevant for Hadoop security, is yet to be determined but I just found it funny how things change quickly in the real world. Intel is now partnered with Cloudera and I don't know whether Intel's distribution will continue and/or project Rhino will be it's own project and Intel will contribute to it independently of it's Hadoop distribution's future. As well as we now know of multiple vulnerabilities of OpenSSL, what we don't know is how version compatibility affects the encryption in Intel's offering.

To summarize what I'm trying to say is, whether I could secure Hadoop without this book, certainly. But would this book be more helpful, absolutely. On a scale of 1 to 5, I give this book a 4.