Iteratees in Big Data at Klout

Our users and clients expect data to be up to date and accurate and it has been a significant technical challenge to reliably meet these goals. In this blog post we describe the usage of Play! Iteratees in our redesigned data collection pipeline.

This post is not meant to be a tutorial on the concept of Iteratees, for which there are many great posts already such as James Roper’s post and Josh Suereth’s post. Rather, this post is a detailed look at how Klout uses Iteratees in the context of large scale data collection and why it is an appropriate and effective programming abstraction for this use case. In a later post we will describe our distributed Akka-based messaging infrastructure, which allowed us to scale and distribute our Iteratee based collectors across clusters of machines.