This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 1.6, “Processing a Scala String One Character at a Time.”
Problem
You want to iterate through each character in a Scala string, performing an operation on each character as you traverse the string.
Solution
Depending on your needs and preferences, you can use the map
or foreach
methods, a for
loop, or other approaches. Here’s a simple example of how to create an uppercase string from an input string using map
:
scala> val upper = "hello, world".map(c => c.toUpper)
upper: String = HELLO, WORLD
As you’ll see in many examples throughout this book, you can shorten that code using the magic of Scala’s underscore character:
scala> val upper = "hello, world".map(_.toUpper)
upper: String = HELLO, WORLD
With any collection―such as a sequence of characters in a string―you can also chain collection methods together to achieve a desired result. In the following example, I call the filter
method on the original String
to create a new String
with all occurrences of the lowercase letter “L” removed, and then use that String
as input to the map
method to convert the remaining characters to uppercase:
scala> val upper = "hello, world".filter(_ != 'l').map(_.toUpper)
upper: String = HEO, WORD
When you first start with Scala you may not be comfortable with the map
method, in which case you can use Scala’s for
loop to achieve the same result. This first example shows another way to print each character:
scala> for (c <- "hello") println(c)
h
e
l
l
o
To write a for
loop to work like a map
method, add a yield
statement to the end of the loop. This for/yield loop is equivalent to the first two map examples:
scala> val upper = for (c <- "hello, world") yield c.toUpper
upper: String = HELLO, WORLD
Adding yield
to a for
loop essentially places the result from each loop iteration into a temporary holding area. When the loop completes, all of the elements in the holding area are returned as a single collection.
This for/yield loop achieves the same result as the third map
example:
val result = for {
c <- "hello, world"
if c != 'l'
} yield c.toUpper
Whereas the map
or for/yield approaches are used to transform one collection into another, the foreach
method is typically used to operate on each element without returning a result. This is useful for situations like printing:
scala> "hello".foreach(println)
h
e
l
l
o
Discussion
Because Scala treats a string as a sequence of characters — and because of Scala’s background as both an object-oriented and functional programming language — you can iterate over the characters in a string with the approaches shown. Compare those examples with a common Java approach:
String s = "Hello";
StringBuilder sb = new StringBuilder();
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
// do something with the character ...
// sb.append ...
}
String result = sb.toString();
You’ll see that the Scala approach is more concise, but still very readable. This combination of conciseness and readability lets you focus on solving the problem at hand. Once you get comfortable with Scala, it feels like the imperative code in the Java example obscures your business logic.
Wikipedia describes imperative programming like this:
“Imperative programming is a programming paradigm that describes computation in terms of statements that change a program state ... imperative programs define sequences of commands for the computer to perform.”
This is shown in the Java example, which defines a series of explicit statements that tell a computer how to achieve a desired result.
Understanding how Scala’s map
method works
Depending on your coding preferences, you can pass large blocks of code to a map
method. These two examples demonstrate the syntax for passing an algorithm to a map
method:
// first example
"HELLO".map(c => (c.toByte+32).toChar)
// second example
"HELLO".map{ c =>
(c.toByte+32).toChar
}
Notice that the algorithm operates on one Char
at a time. This is because the map
method in this example is called on a String
, and map
treats a String
as a sequential collection of Char
elements. The map
method has an implicit loop, and in that loop it passes one Char
at a time to the algorithm it’s given.
Although this algorithm it still short, imagine for a moment that it is longer. In this case, to keep your code clear, you might want to write it as a method (or function) that you can pass into the map
method.
To write a method that you can pass into map
to operate on the characters in a String
, define a method that takes a single Char
as input, then perform your logic on that Char
inside your method, and return whatever it is that your algorithm returns.
Though the algorithm in the following example is still short, it demonstrates how to create a custom method, and pass that method into map
:
// write your own method that operates on a character
scala> def toLower(c: Char):Char = (c.toByte+32).toChar
toLower: (c: Char)Char
// use that method with map
scala> "HELLO".map(toLower)
res0: String = hello
As an added benefit, the same method also works with the for/yield approach:
scala> val s = "HELLO"
s: java.lang.String = HELLO
scala> for (c <- s) yield toLower(c)
res1: String = hello
Methods vs functions
I’ve used the word “method” in this discussion, but you can also use functions here instead of methods. What’s the difference between a Scala method and a function?
Here’s a quick look at a function that’s equivalent to the toLower
method shown:
val toLower = (c: Char) => (c.toByte+32).toChar
This function can be passed into map
in the same way the previous toLower
method was used:
scala> "HELLO".map(toLower)
res0: String = hello
For more information on functions, and the differences between methods and functions, see Chapter 9.
A complete example
The following example demonstrates how to call the getBytes
method on a String
, and then pass a block of code into a foreach
method to help calculate an Adler-32 checksum value on a String
:
package tests
/**
* Calculate the Adler-32 checksum using Scala.
* @see http://en.wikipedia.org/wiki/Adler-32
*/
object Adler32Checksum {
val MOD_ADLER = 65521
def main(args: Array[String]) {
val sum = adler32sum("Wikipedia")
printf("checksum (int) = %d\n", sum)
printf("checksum (hex) = %s\n", sum.toHexString)
}
def adler32sum(s: String): Int = {
var a = 1
var b = 0
s.getBytes.foreach{char =>
a = (char + a) % MOD_ADLER
b = (b + a) % MOD_ADLER
}
// note: Int is 32 bits, which this requires
b * 65536 + a // or (b << 16) + a
}
}
The getBytes
method returns a sequential collection of bytes from a String
, as shown here:
scala> "hello".getBytes
res0: Array[Byte] = Array(104, 101, 108, 108, 111)
Adding the foreach
method call after getBytes
lets you operate on each Byte
value:
scala> "hello".getBytes.foreach(println)
104
101
108
108
111
I use foreach
in this example instead of map
because I want to loop over each Byte
in the String
, but I don’t want to return anything from the loop.
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |
See Also
- Under the covers, the Scala compiler translates a for loop into a foreach method call. This gets more complicated if the loop has one or more if statements (guards) or a yield expression. This is discussed in detail in Recipe 3.1, “Looping with for and foreach,” and I also provide examples on my website in The Scala ‘for’ loop translation scheme. The full details are presented in Section 6.19 of the current Scala Language Specification, “For Comprehensions and For Loops.”
- The Adler-32 checksum algorithm on Wikipedia