Friday, May 16, 2014
Another paper on Infosphere Streams vs. Storm
I found this recent paper mentioned in Storm mailing lists on yet another performance comparison of Streams and Storm. Giving credit to IBM, this time around it seems paper is written by developers and not sales people. Here's the direct link to the pdf. Most of the paper was based on v. 0.8 of Storm but at the end of the paper comparison with Storm 0.9.0.1 was also referenced. Comparison was done using Storm 0.8 with ZeroMQ and Storm 0.9 with Netty for transfer protocol. It is an interesting read for a change. I am also surprised to see Apache Avro used for serialization. I will not cloud your judgement by stating my opinion but I remain skeptical of these papers. I urge Storm community to offer its findings from own comparison. One thing I'd like to state is that again IBM claims it is much faster to implement a use-case using Streams over Storm and from my own experience, I was able to install Storm 0.8, configure my IDE to develop and test topologies, implement my use case within a couple of hours of work, without any previous knowledge of Storm. With Streams, I've wasted literally weeks to implement the same use case and I can't even get past configuring their "state-of-the-art" development environment based on Eclipse. Streams requires a GUI for development, they do support remote development and that's what we're trying to implement, due to security concerns of running a GUI on a Linux server. Even IBM recommends not to use that feature. For the record, my use case was to query SQL Server database every 10 seconds or so, process it with a streaming engine and store the processed data in HBase. This worked out perfectly with Storm. With Streams, not so much. Again, please don't take my words for it and try it out for yourselves.