Friday, June 13, 2014

Work-around for Isilon HDFS and Hadoop 2.2+ hdfs client incompatibility

If you're running Isilon NAS and use it's Map/Reduce functionality for your workloads, you're probably still using Hadoop 1.x. If you're thinking of moving to Hadoop 2 and you have a secondary standalone cluster running Hadoop 2.2+ and you want to move data back and forth using utilities like distcp, I have bad news for you. Isilon does not support Hadoop 2.2. Sometime in the third quarter, they will release OneFS compatible with Hadoop 2.3+. The problem is with the underlying protobuf version incompatibility. There are some work-arounds available like doing distcp with webhdfs going from Isilon to standalone cluster but it doesn't work the other way around, at least I couldn't get it to work. On top of that, I'd lose packets during distcp via webhdfs and jobs would fail due to mismatched checksums. Great, so what is the solution, well you can also distcp using hftp protocol, it's a client independent protocol specifically built for incompatible hdfs clients. The source has to be read-only so you can easily move data from standalone to Isilon. That unfortunately is still useless for customers that need to move data the other way, Isilon to standalone. I haven't found hftp option on Isilon, doesn't mean it doesn't exist, I just was not able to find one. The one solution that I found that actually works both ways and without drawbacks is to mount Isilon hdfs share with NFS. Granted, the mounted share will look as if it's a local file system but you can at that point use hdfs command line utilities put/get to move data around from hdfs to local and back. If you use hdfs Java API, I guess you may even have your code write/read in the same step, I didn't try that yet. I wish this was published, I can't say Isilon hdfs documentation is in-depth, this was kind of a "duh" moment when this idea came to me. It was not really obvious. I hope you find this trick useful.

Post a Comment