Everyone is talking about security nowadays when it comes to Hadoop and it's ecosystem. Judging by the last two major acquisitions from Hortonworks and Cloudera, the major players are not taking it lightly either. I've been weary of security implications of maintaining an insecure Hadoop as well. Most Hadoop books dedicate a chapter or two on Hadoop security and up until now there were no books dedicated solely to Hadoop security. Choices were slim.. Enter Securing Hadoop. This book is only 120 pages and I was able to read it cover to cover on my commute to work in about a week. I will not provide a chapter by chapter summary of what this book offers, the book has a one page description of each chapter which describes everything better than I ever could. What I will say in this review is what this book does best and what it can improve on in the next iteration.
This book is a "good to have" but not a "must have", unfortunately. The book does a good job at whetting my appetite but it doesn't provide a full course meal. The table of contents can easily set your expectations too high but in my opinion it doesn't deliver on everything this book could have been. Of course, with anything, one needs to do their homework and practice on their own and use information in this book as a guide. I guess what this book is good at, is it gives one a starting point and it certainly has a lot to offer in that department, what it lacks is in examples. There are sample configurations sprinkled all over the book but I don't think they're enough to truly grasp the topic. After reading this book I still approach Hadoop security as "black art". So far my review seems negative but it's not by any means intended to be. I really enjoyed it, I just wanted "more" from it. Overall, I am very grateful to the author and publisher for writing a comprehensive reference material. I urge them to publish the next iteration as soon as possible, especially covering XA Secure acquisition from Hortonworks mentioned earlier. This book does however, cover Gazzang solution to block level encryption which is now part of Cloudera. It does cover Apache Knox, project Rhino and some other solutions I've never heard of, it does cover security for HBase, Hive, Hue and Oozie, which I haven't seen in any other books so far. For that, I am very grateful.
One funny anecdote from the book is when it covers the Intel's distribution for Hadoop, which at the time was not in partnership with Cloudera yet. The book states that Intel's Hadoop distribution leverages OpenSSL for data encryption and version of OpenSSL is 1.0.1C, which at this point is found to be vulnerable to Heartbleed bug. Whether it is relevant for Hadoop security, is yet to be determined but I just found it funny how things change quickly in the real world. Intel is now partnered with Cloudera and I don't know whether Intel's distribution will continue and/or project Rhino will be it's own project and Intel will contribute to it independently of it's Hadoop distribution's future. As well as we now know of multiple vulnerabilities of OpenSSL, what we don't know is how version compatibility affects the encryption in Intel's offering.
To summarize what I'm trying to say is, whether I could secure Hadoop without this book, certainly. But would this book be more helpful, absolutely. On a scale of 1 to 5, I give this book a 4.