Review: processing billions of events a day with Kafka, Zookeeper and Storm

On Tuesday January 12th 2016 four guys from Server Density, a server monitoring service, held a tech chat session about event processing with Kafka, Zookeeper and Storm. They use this setup to process lots of data points, which made me believe it is a good session to view for future reference or water cooler talk.

It was the first time I followed a live webinar through Google Plus Hangouts on Air. Normally I expect to sit and wait for the webinar to start, but somehow this didn’t work for me now. After some attempts I started viewing no later than 10 minutes in. This meant I missed introductions, but I guess not much else.

The first part consisted of questions among the guys present in the room. Most of the questions geared to general architecture and (personal) experience with this event processing stack. Some details might have gone lost on me during the live feed because of room acoustics, microphone volume or because I was too tired to process some pronunciations.

Once the questions of viewers started, I either paid less attention to the audio problems or they actually became less of a problem. This was also the point where all guys participated instead of the dialog that happened before.

One of the first questions asked, and more of a suggestion, was to move the microphone. There weren’t many questions coming in, but still more than time permitted. So I felt the need to vote on the ones I wanted answered. There weren’t many people voting either, so I guess my votes did make some impact.

It surprised me when they announced the last question to be answered. One or two more top questions seemed interesting, but time was limited and 5 minutes is too short for two questions.

My general impression is that starting with Kafka, Zookeeper or Storm might seem difficult, but it won’t take long until you get comfortable. Although the majority of the tooling is Java or JVM based software, the interfaces for other languages are sufficient. It won’t matter if, like Server Density, most of your software is written in Python. You’ll bump your head a few times, get past some (what seem to be) quirks and you’ll realise it just works.

Key points I got from this session:

  • Redundancy prevents data loss
  • In this case data is kept no longer then 10 minutes
  • Most data is processed near real-time
  • Latency between workers is reduced by either keeping them on the same machine or in close proximity
  • You should monitor your network bandwidth usage, things might get chatty and you’ll hit the limits
  • Debugging is much easier if you keep things small, one process doing one thing. That way it’s easier to reproduce bugs.
  • Monitoring and improving the code keeps the process healthy
  • People might ask two questions, which makes voting difficult. Vote or not if you don’t care about the other question?

Notes I made on giving tech talks or presentations:

  • Google Hangouts is a nice way to give a talk and have viewers ask questions
  • Test or rehearse tech talks to avoid sound problems
  • Prepare some graphics to support the story

I liked the format and content of this talk, so I will be sure to watch another session the next time it pops up in e-mail or other feed.