How to process a Scala String one character at a time

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 1.6, “Processing a Scala String One Character at a Time.”

Problem

You want to iterate through each character in a Scala string, performing an operation on each character as you traverse the string.

Solution

Depending on your needs and preferences, you can use the map or foreach methods, a for loop, or other approaches. Here’s a simple example of how to create an uppercase string from an input string using map:

scala> val upper = "hello, world".map(c => c.toUpper)
upper: String = HELLO, WORLD

As you’ll see in many examples throughout this book, you can shorten that code using the magic of Scala’s underscore character:

scala> val upper = "hello, world".map(_.toUpper)
upper: String = HELLO, WORLD

With any collection―such as a sequence of characters in a string―you can also chain collection methods together to achieve a desired result. In the following example, I call the filter method on the original String to create a new String with all occurrences of the lowercase letter “L” removed, and then use that String as input to the map method to convert the remaining characters to uppercase:

scala> val upper = "hello, world".filter(_ != 'l').map(_.toUpper)
upper: String = HEO, WORD

When you first start with Scala you may not be comfortable with the map method, in which case you can use Scala’s for loop to achieve the same result. This first example shows another way to print each character:

scala> for (c <- "hello") println(c)
h
e
l
l
o

To write a for loop to work like a map method, add a yield statement to the end of the loop. This for/yield loop is equivalent to the first two map examples:

scala> val upper = for (c <- "hello, world") yield c.toUpper
upper: String = HELLO, WORLD

Adding yield to a for loop essentially places the result from each loop iteration into a temporary holding area. When the loop completes, all of the elements in the holding area are returned as a single collection.

This for/yield loop achieves the same result as the third map example:

val result = for {
    c <- "hello, world"
    if c != 'l'
} yield c.toUpper

Whereas the map or for/yield approaches are used to transform one collection into another, the foreach method is typically used to operate on each element without returning a result. This is useful for situations like printing:

scala> "hello".foreach(println)
h
e
l
l
o

Discussion

Because Scala treats a string as a sequence of characters — and because of Scala’s background as both an object-oriented and functional programming language — you can iterate over the characters in a string with the approaches shown. Compare those examples with a common Java approach:

String s = "Hello";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
    char c = s.charAt(i);
    // do something with the character ...
    // sb.append ...
}
String result = sb.toString();

You’ll see that the Scala approach is more concise, but still very readable. This combination of conciseness and readability lets you focus on solving the problem at hand. Once you get comfortable with Scala, it feels like the imperative code in the Java example obscures your business logic.

Wikipedia describes imperative programming like this:

“Imperative programming is a programming paradigm that describes computation in terms of statements that change a program state ... imperative programs define sequences of commands for the computer to perform.”

This is shown in the Java example, which defines a series of explicit statements that tell a computer how to achieve a desired result.

Understanding how Scala’s map method works

Depending on your coding preferences, you can pass large blocks of code to a map method. These two examples demonstrate the syntax for passing an algorithm to a map method:

// first example
"HELLO".map(c => (c.toByte+32).toChar)

// second example
"HELLO".map{ c => 
    (c.toByte+32).toChar
}

Notice that the algorithm operates on one Char at a time. This is because the map method in this example is called on a String, and map treats a String as a sequential collection of Char elements. The map method has an implicit loop, and in that loop it passes one Char at a time to the algorithm it’s given.

Although this algorithm it still short, imagine for a moment that it is longer. In this case, to keep your code clear, you might want to write it as a method (or function) that you can pass into the map method.

To write a method that you can pass into map to operate on the characters in a String, define a method that takes a single Char as input, then perform your logic on that Char inside your method, and return whatever it is that your algorithm returns.

Though the algorithm in the following example is still short, it demonstrates how to create a custom method, and pass that method into map:

// write your own method that operates on a character
scala> def toLower(c: Char):Char = (c.toByte+32).toChar
toLower: (c: Char)Char

// use that method with map
scala> "HELLO".map(toLower)
res0: String = hello

As an added benefit, the same method also works with the for/yield approach:

scala> val s = "HELLO"
s: java.lang.String = HELLO

scala> for (c <- s) yield toLower(c)
res1: String = hello

Methods vs functions

I’ve used the word “method” in this discussion, but you can also use functions here instead of methods. What’s the difference between a Scala method and a function?

Here’s a quick look at a function that’s equivalent to the toLower method shown:

val toLower = (c: Char) => (c.toByte+32).toChar

This function can be passed into map in the same way the previous toLower method was used:

scala> "HELLO".map(toLower)
res0: String = hello

For more information on functions, and the differences between methods and functions, see Chapter 9.

A complete example

The following example demonstrates how to call the getBytes method on a String, and then pass a block of code into a foreach method to help calculate an Adler-32 checksum value on a String:

package tests

/**
 * Calculate the Adler-32 checksum using Scala.
 * @see http://en.wikipedia.org/wiki/Adler-32
 */
object Adler32Checksum {

  val MOD_ADLER = 65521

  def main(args: Array[String]) {
    val sum = adler32sum("Wikipedia")
    printf("checksum (int) = %d\n", sum)
    printf("checksum (hex) = %s\n", sum.toHexString)
  }

  def adler32sum(s: String): Int = {
    var a = 1
    var b = 0
    s.getBytes.foreach{char =>
      a = (char + a) % MOD_ADLER
      b = (b + a) % MOD_ADLER
    }
    // note: Int is 32 bits, which this requires
    b * 65536 + a     // or (b << 16) + a
  }

}

The getBytes method returns a sequential collection of bytes from a String, as shown here:

scala> "hello".getBytes
res0: Array[Byte] = Array(104, 101, 108, 108, 111)

Adding the foreach method call after getBytes lets you operate on each Byte value:

scala> "hello".getBytes.foreach(println)
104
101
108
108
111

I use foreach in this example instead of map because I want to loop over each Byte in the String, but I don’t want to return anything from the loop.

See Also

  • Under the covers, the Scala compiler translates a for loop into a foreach method call. This gets more complicated if the loop has one or more if statements (guards) or a yield expression. This is discussed in detail in Recipe 3.1, “Looping with for and foreach,” and I also provide examples on my website in The Scala ‘for’ loop translation scheme. The full details are presented in Section 6.19 of the current Scala Language Specification, “For Comprehensions and For Loops.”
  • The Adler-32 checksum algorithm on Wikipedia