How to process a Scala String one character at a time (with map, for, and foreach)

Scala FAQ: How can I iterate through each character in a Scala String, performing an operation on each character as I traverse the string?

Solution

Depending on your needs and preferences, you can use the Scala map or foreach methods, a for loop, or other approaches.
 

The ‘map’ method

Here’s a simple example of how to create an uppercase string from an input string, using the map method that’s available on all Scala sequential collections:

scala> val upper = "hello, world".map(c => c.toUpper)
upper: String = HELLO, WORLD

As you see in many examples in the Scala Cookbook, you can shorten that code using the magic of Scala’s underscore character:

scala> val upper = "hello, world".map(_.toUpper)
upper: String = HELLO, WORLD

With any Scala collection — such as a sequence of characters in a string — you can also chain collection methods together to achieve a desired result. In the following example, the filter method is called on the original String to create a new String with all occurrences of the lowercase letter “L” removed. That String is then used as input to the map method to convert the remaining characters to uppercase:

scala> val upper = "hello, world".filter(_ != 'l').map(_.toUpper)
upper: String = HEO, WORD

The ‘for’ loop

When you first start with Scala, you may not be comfortable with the map method, in which case you can use Scala’s for loop to achieve the same result. This example shows another way to print each character:

scala> for (c <- "hello") println(c)
h
e
l
l
o

To write a for loop to work like a map method, add a yield statement to the end of the loop. This for/yield loop is equivalent to the first two map examples:

scala> val upper = for (c <- "hello, world") yield c.toUpper
upper: String = HELLO, WORLD

Adding yield to a for loop essentially places the result from each loop iteration into a temporary holding area. When the loop completes, all of the elements in the holding area are returned as a single collection.

This for/yield loop achieves the same result as the third map example:

val result = for {
    c <- "hello, world"
    if c != 'l'
} yield c.toUpper

The ‘foreach’ method

Whereas the map or for/yield approaches are used to transform one collection into another, the foreach method is typically used to operate on each element without returning a result. This is useful for situations like printing:

scala> "hello".foreach(println)
h
e
l
l
o

Note: Having used Scala for a few years now, I can say that using map is the most common Scala idiom for use cases like this. Using for/yield is also common when there are multiple lines of processing to perform, and in my experience, the foreach method isn’t used that often in Scala. That being said, feel free to use whatever you're comfortable using.

Discussion

Because Scala treats a string as a sequence of characters -- and because of Scala’s back‐ ground as both an object-oriented and functional programming language -- you can iterate over the characters in a string with the approaches shown. Compare those examples with a common Java approach:

String s = "Hello";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
    char c = s.charAt(i);
    // do something with the character ...
    // sb.append ...
}
String result = sb.toString();

You’ll see that the Scala approach is more concise, but still very readable. This combination of conciseness and readability lets you focus on solving the problem at hand. Once you get comfortable with Scala, it feels like the imperative code in the Java example obscures your business logic (IMHO).

Imperative programming

Wikipedia describes imperative programming like this:

“Imperative programming is a programming paradigm that describes computation in terms of statements that change a program state ... imperative programs define sequences of commands for the computer to perform.”

This is shown in the Java example, which defines a series of explicit statements that tell a computer how to achieve a desired result.

Understanding how ‘map’ works

Depending on your coding preferences, you can pass large blocks of code to a map method. These two examples demonstrate the syntax for passing an algorithm to a map method:

// first example
"HELLO".map(c => (c.toByte+32).toChar)

// second example
"HELLO".map{ c =>
    (c.toByte+32).toChar
}

Notice that the algorithm operates on one Char at a time. This is because the map method in this example is called on a String, and map treats a String as a sequential collection of Char elements. The map method has an implicit loop, and in that loop, it passes one Char at a time to the algorithm it’s given.

Although this algorithm it still short, imagine for a moment that it is longer. In this case, to keep your code clear, you might want to write it as a method (or function) that you can pass into the map method.

To write a method that you can pass into map to operate on the characters in a String, define it to take a single Char as input, then perform the logic on that Char inside the method. When the logic is complete, return whatever it is that your algorithm returns.

Though the following algorithm is still short, it demonstrates how to create a custom method and pass that method into map:

// write your own method that operates on a character

scala> def toLower(c: Char): Char = (c.toByte+32).toChar
toLower: (c: Char)Char
// use that method with map

scala> "HELLO".map(toLower)
res0: String = hello

As an added benefit, the same method also works with the for/yield approach:

scala> val s = "HELLO"
s: java.lang.String = HELLO

scala> for (c <- s) yield toLower(c)
res1: String = hello

Scala methods vs functions

I’ve used the word “method” in this discussion, but you can also use functions here instead of methods. What’s the difference between a method and a function?

Here’s a quick look at a function that’s equivalent to the toLower method shown:

val toLower = (c: Char) => (c.toByte+32).toChar

This function can be passed into map in the same way the previous toLower method was used:

scala> "HELLO".map(toLower)
res0: String = hello

For more information on functions and the differences between methods and functions, see Chapter 9 of the Scala Cookbook, Functional Programming.

A complete example

The following example demonstrates how to call the getBytes method on a String, and then pass a block of code into a foreach method to help calculate an Adler-32 checksum value on a String:

package tests

/**
* Calculate the Adler-32 checksum using Scala.
* @see http://en.wikipedia.org/wiki/Adler-32
*/
object Adler32Checksum {
    val MOD_ADLER = 65521
    def main(args: Array[String]) {
        val sum = adler32sum("Wikipedia")
        printf("checksum (int) = %d\n", sum)
        printf("checksum (hex) = %s\n", sum.toHexString)
    }
    def adler32sum(s: String): Int = {
        var a = 1
        var b = 0
        s.getBytes.foreach{char =>
            a = (char + a) % MOD_ADLER
            b = (b + a) % MOD_ADLER
        }
        // note: Int is 32 bits, which this requires
        b * 65536 + a     // or (b << 16) + a
    }
}

The getBytes method returns a sequential collection of bytes from a String, as shown here:

scala> "hello".getBytes
res0: Array[Byte] = Array(104, 101, 108, 108, 111)

Adding the foreach method call after getBytes lets you operate on each Byte value:

scala> "hello".getBytes.foreach(println)
104
101
108
108
111

You use foreach in this example instead of map, because the goal is to loop over each Byte in the String, and do something with each Byte, but you don’t want to return anything from the loop.

See Also

  • Under the covers, the Scala compiler translates a for loop into a foreach method call. This gets more complicated if the loop has one or more if statements (guards) or a yield expression. This is discussed in detail in Recipe 3.1 in the Scala Cookbook, “Looping with for and foreach.” The full details are presented in “For Comprehensions and For Loops” in Section 6.19 of the current Scala Language Specification.
  • The Adler-32 checksum algorithm