Monday, August 25, 2014

Gotchas discovered going from Hadoop 2.2 to Hadoop 2.4.

I was trying to execute some run-of-the-mill Mapreduce against HBase on a development cluster running HDP 2.1 and discovered some gotchas going from HDP 2.0 to HDP 2.1. For the unaware, HDP 2.0 stack is Hadoop 2.2.0 and HBase 0.96.0 and HDP 2.1 is Hadoop 2.4.0 and HBase 0.98.0. The stack I used was and then desperately upgraded to the brand-new stack with YARN bug fixes. Spoiler, that did not solve the actual problem. On a side-note, there's a lot of "coincidental" advice out-there and I was not able to fix my issues on following this link This article did however put me on the right path, I went to the nodemanager where the failing task attempt executed and looked up the "real" error. The problem was class not found for one library that was moved from hadoop-common Maven artifact to hadoop-yarn-common. I will update the post with the right error later. Once that's done basic jobs started running. I did encounter another issue and that issue is described in this Jira. I followed the same method as described previously, I went to the nodemanager logs and found the failing method call and found it to be described in the Jira above. So I just compiled my code using 2.4.1 version of Hadoop and lo and behold it executed the job successfully. So in summary, going from Hadoop 2.2 to Hadoop 2.4 in your Mapreduce code, make sure you change artifactId to hadoop-yarn-common from hadoop-common and compile with version 2.4.1.

Good luck.
Post a Comment