In Scala you can still use Java threads, but the “Actor model” is the preferred approach for concurrency. The Actor model is at a much higher level of abstraction than threads, and once you understand the model, it lets you focus on solving the problem at hand, rather than worrying about the low-level problems of threads, locks, and shared data.
Although earlier versions of Scala included its original Actors library, Scala 2.10.0 began the official transition to the Akka actor library from Typesafe, which is more robust than the original library. The original scala.actors library was deprecated in Scala 2.10.1.
In general, actors give you the benefit of offering a high level of abstraction for achieving concurrency and parallelism. Beyond that, the Akka actor library adds these additional benefits:
- Lightweight, event-driven processes. The documentation states that there can be approximately 2.7 million actors per gigabyte of RAM.
- Fault tolerance. Akka actors can be used to create “self-healing systems.” (The Akka “team blog” is located at http://letitcrash.com/.)
- Location transparency. Akka actors can span multiple JVMs and servers; they’re designed to work in a distributed environment using pure message passing.
A “high level of abstraction” can also be read as “ease of use.” It doesn’t take very long to understand the Actor model, and once you do, you’ll be able to write complex, concurrent applications much more easily than you can with the basic Java libraries. I wrote a speech interaction application (speech recognition input, text-to-speech output) named SARAH that makes extensive use of Akka actors, with agents constantly working on tasks in the background. Writing this code with actors was much easier than the equivalent threading approach.
I like to think of an actor as being like a web service on someone else’s servers that I can’t control. I can send messages to that web service to ask it to do something, or I can query it for information, but I can’t reach into the web service to directly modify its state or access its resources; I can only work through its API, which is just like sending immutable messages. In one way, this is a little limiting, but in terms of safely writing parallel algorithms, this is very beneficial.
The Actor Model
Before digging into the recipes in this chapter, it can help to understand the Actor model.
The first thing to understand about the Actor model is the concept of an actor:
- An actor is the smallest unit when building an actor-based system, like an object in an OOP system.
- Like an object, an actor encapsulates state and behavior.
- You can’t peek inside an actor to get its state. You can send an actor a message requesting state information (like asking a person how they’re feeling), but you can’t reach in and execute one of its methods, or access its fields.
- An actor has a mailbox (an inbox), and its purpose in life is to process the messages in its mailbox.
- You communicate with an actor by sending it an immutable message. These messages go into the actor’s mailbox.
- When an actor receives a message, it’s like taking a letter out of its mailbox. It opens the letter, processes the message using one of its algorithms, then moves on to the next letter in the mailbox. If there are no more messages, the actor waits until it receives one.
In an application, actors form hierarchies, like a family, or a business organization:
- The Typesafe team recommends thinking of an actor as being like a person, such as a person in a business organization.
- An actor has one parent (supervisor): the actor that created it.
- An actor may have children. Thinking of this as a business, a president may have a number of vice presidents. Those VPs will have many subordinates, and so on.
- An actor may have siblings. For instance, there may be 10 VPs in an organization.
- A best practice of developing actor systems is to “delegate, delegate, delegate,” especially if behavior will block. In a business, the president may want something done, so he delegates that work to a VP. That VP delegates work to a manager, and so on, until the work is eventually performed by one or more subordinates.
- Delegation is important. Imagine that the work takes several man-years. If the president had to handle that work himself, he couldn’t respond to other needs (while the VPs and other employees would all be idle).
A final piece of the Actor model is handling failure. When performing work, something may go wrong, and an exception may be thrown. When this happens, an actor suspends itself and all of its children, and sends a message to its supervisor, signaling that a failure has occurred. (A bit like Scotty calling Captain Kirk with a problem.)
Depending on the nature of the work and the nature of the failure, the supervising actor has a choice of four options at this time:
- Resume the subordinate, keeping its internal state
- Restart the subordinate, giving it a clean state
- Terminate the subordinate
- Escalate the failure
In addition to those general statements about actors, there are a few important things to know about Akka’s implementation of the Actor model:
- You can’t reach into an actor to get information about its state. When you instantiate an Actor in your code, Akka gives you an ActorRef, which is essentially a façade between you and the actor.
- Behind the scenes, Akka runs actors on real threads; many actors may share one thread.
- There are different mailbox implementations to choose from, including variations of unbounded, bounded, and priority mailboxes. You can also create your own mailbox type.
- Akka does not let actors scan their mailbox for specific messages.
- When an actor terminates (intentionally or unintentionally), messages in its mail‐ box go into the system’s “dead letter mailbox.”
Hopefully these notes about the general Actor model, and the Akka implementation specifically, will be helpful in understanding the recipes in this chapter.
Scala offers other conveniences for writing code that performs operations in parallel. A future can be used for simple, “one off ” tasks that require concurrency. The Scala collections library also includes special parallel collections, which can be used to improve the performance of large collections and certain algorithms.
There are interesting debates about what the terms concurrency and parallelism mean. I tend to use them interchangeably, but for one interesting discussion of their differences—such as concurrency being one vending machine with two lines, and parallelism being two vending machines and two lines—see this blog post.
The Scala Cookbook
This article is sponsored by the Scala Cookbook, which I wrote for O’Reilly:
You can find the Scala Cookbook at these locations: