Distributed Collections for Scala

By Alvin Alexander. Last updated: September 30, 2012

URL

https://github.com/scala-incubator/distributed-collections

Distributed Collections for Scala is a library for large scale data processing that uses different cluster computing frameworks as the back-end. Library inherits Scala 2.9.1 collections generic interface enriched with additional methods like join, reduceByKey etc. Currently the library uses only Hadoop as the back-end processing engine. However, we are aiming to extend the library to work with other frameworks like Spark, HaLoop and Nephele. This library is still in early phases of development, many features are not functioning and it is still UNUSABLE. The project timline can be found here.

Distributed Collections for Scala

books by alvin