Using Scala match expressions with XML

Problem: You want to use match expressions as another way to access the information contained in XML data when writing your Scala applications.

Solution

Given this XML literal:

val pizzaNode = 
  <pizza>
    <crust type="thin" size="14" />
    <topping>cheese</topping>
    <topping>sausage</topping>
  </pizza>

you can access the <topping> and <crust> elements with a Scala match expression, like this:

pizzaNode match {
  case <topping>{value}</topping> => println(s"Got a topping: $value")
  case <crust /> => println("Got a <crust/> tag")
  case _ => println("D'oh!")
}

You’ll usually put a match expression like this in a method, so let’s do that here:

/**
 * Version 1
 * A pizza node can have <topping> and <crust> tags.
 */
def handlePizzaNode(pizzaNode: Node) {
  pizzaNode match {
    case <topping>{value}</topping> => println(s"Got a topping: $value")
    case <crust /> => println("Got a <crust/> tag")
    case _ => println("D'oh!")
  }
}

A few examples in the REPL demonstrate how this works. First, a <topping> element:

scala> val node = <topping>cheese</topping>
node: scala.xml.Elem = <topping>cheese</topping>

scala> handlePizzaNode(node)
Got a topping: cheese

Next, a <crust> element:

scala> val node = <crust type="thin" size="14" />
node: scala.xml.Elem = <crust type="thin" size="14"/>

scala> handlePizzaNode(node)
Got a <crust/> tag

The following example demonstrates how to iterate over all the <topping> nodes in the <pizza>:

scala> for (topping <- pizzaNode \ "topping") handlePizzaNode(topping)
Got a topping: cheese
Got a topping: sausage

Use @ to access node attributes

Accessing the attributes of the <crust> tag and using them on the right side of its case statement takes a bit more work.

Recipe 3.12 of the Scala Cookbook, “Using Pattern Matching in Match Expressions,” demonstrates a solution to this problem, adding a variable to a pattern match:

variableName @ pattern

As mentioned in that recipe, this creates a variable-binding pattern. If the pattern succeeds, it sets the variable to the object it matches.

Using this approach to solve the current problem, rewrite the <crust> tag case statement like this:

case crust @ <crust /> => println(s"Got a <crust/> tag: $crust")

The modified version of the method now looks like this:

/**
 * Version 2: Get access to the <crust> tag attributes.
 */
def handlePizzaNode(pizzaNode: Node) {
  pizzaNode match {
    case <topping>{value}</topping> => 
           println(s"Got a topping: $value")
    case crust @ <crust /> => 
           val crustSize = crust \ "@size"
           val crustType = crust \ "@type"
           println(s"crustSize: $crustSize, crustType: $crustType")    
    case _ => 
           println("D'oh!")
  }
}

Running the code in the REPL again, you can now access the <crust> data on the right side of the case statement:

scala> val node = <crust type="thin" size="14" />
node: scala.xml.Elem = <crust type="thin" size="14"/>

scala> handlePizzaNode(node)
crustSize: 14, crustType: thin

A real-world method should either return the nodes that were found by each case match, or call a method to act on each node. The following code demonstrates the second approach, how you might act on each node by calling addTopping, setCrustSize, and setCrustType methods (which are left as an exercise for the reader to implement):

/**
 * Version 3: Call useful methods
 */
def handlePizzaNode(pizzaNode: Node) {
  pizzaNode match {
    case <topping>{value}</topping> => 
         addTopping(value)
    case crust @ <crust /> => 
         setCrustSize((node \ "@size").text)
         setCrustType((node \ "@type").text)
    case _ => println("D'oh!")
  }
}

Handling an array of elements

Now that you’ve seen the use of the @ symbol in case statements, you can extend the approach to handle multiple elements of the same type, such as multiple <stock> elements contained in this XML literal:

val stocks = 
  <stocks>
    <stock>AAPL</stock>
    <stock>AMZN</stock>
    <stock>GOOG</stock>
  </stocks>

The following match expression shows the formula to access and print the value of each <stock> element:

stocks match {
  case <stocks>{stocks @ _*}</stocks> =>
    for (stock @ <stock>{_*}</stock> <- stocks)
      println(s"stock: ${stock.text}")
}

When that code is run, it yields the following output:

stock: AAPL
stock: AMZN
stock: GOOG

Discussion

There are a few pitfalls to be aware of when using match expressions with XML. For example, I showed an earlier example like this that works:

// this works
<p>Hello, world</p> match {
  case <p>{text}</p> => println(text)
}

But it’s important to know that in a match expression the tags have to be an exact match. The following match expression blows up with a MatchError because I’m looking for text between matching <topping> tags, but the text is given in an attribute of the <topping> tag:

// throws a MatchError
<topping value="Hello, world" /> match {
  case <topping>{text}</topping> => println(text)
}

Although that demonstrates a communication error that’s likely to throw off most algorithms, you’ll also get a MatchError if the <topping> tags are formed correctly, but are empty:

// throws a MatchError
<topping></topping> match {
  case <topping>{text}</topping> => println(text)
}

With that case statement, you’ll also get a MatchError if the <topping> tags contain another XML tag:

// throws a MatchError
<topping>Green<br/>olives</topping> match {
  case <topping>{text}</topping> => println(text)
}

To solve these problems, and other similar problems, the correct approach is to write one or more case statements to handle the XML you want to allow, and then add a default case statement to prevent a MatchError from being thrown by your match expression, as shown here:

<topping></topping> match {
  case <topping>{text}</topping> => println(text)
  case _ => println("got something else")
}

Handling unexpected tags

XML tags embedded in other XML tags can also cause problems in match expressions. I showed in other recipes that you can extract the text from the following XML literal, even though it contains a <br> tag:

scala> <p>mystery<br/></p>.text
res0: String = mystery

But attempting to match that XML literal in the following match expression throws a MatchError:

// this throws a MatchError
<p>Hello, <br/>world</p> match {
  case <p>{content}</p> => println(content)
}

A quick trip into the REPL helps us understand the problem:

scala> val hello = <p>Hello, <br/>world</p>
hello: scala.xml.Elem = <p>Hello, <br/>world</p>

scala> hello.child.foreach(e => println(e.getClass))
class scala.xml.Text
class scala.xml.Elem
class scala.xml.Text

hello was expected to contain plain text, but it contains other XML entities. Contrast that output with the output from this example, where the <br> tag is removed:

scala> val hello = <p>Hello, world</p>
hello: scala.xml.Elem = <p>Hello, world</p>

scala> hello.child.foreach(e => println(e.getClass))
class scala.xml.Text

When only <p> tags are used, hello contains only a Text element, and you can successfully get a match in your match expression; but when one or more <br> tags are included inside the <p> tags, the match expression fails.

To solve this problem, you need to change the left side of the case statement. Using the @ approach shown in the Solution helps you get where you need to be:

// this works
<p>Hello, <br/>world</p> match {
  case <p>{ n @ _* }</p> => n.foreach(println)
}

This prints the following output in the REPL:

Hello, 
<br/>
world

In summary, use this syntax to handle the left side of the case statement, and manipulate the resulting elements as needed on the right side of the statement.

Can’t match attributes

As shown in the Solution, you can’t easily match node attributes in a match expression. To get around this problem, use one of the following approaches with your case statements.

If your XML node only has one attribute, like this:

val node = <crust type="thin" />

you can use this approach:

node match {
  case c @ <crust/> if (c \ "@type").text == "thin" => println(s"type is thin")
  case c @ <crust/> if (c \ "@type").text == "thick" => println(s"type is thin")
}

With multiple attributes you’ll probably want to use the approach shown in the Solution, handling the attributes on the right side of the case statement:

val node = <crust type="thin" size="14" />

node match {
  case crust @ <crust /> => 
       val crustSize = crust \ "@size"
       val crustType = crust \ "@type"
       println(s"crustSize: $crustSize, crustType: $crustType")    
}

See Also

  • The pattern matching approach shown in this recipe is similar to other recipes in this chapter, and can also be combined with them, specifically Recipes 16.3 through 16.6 of the Scala Cookbook
  • Recipe 3.12 of the Scala Cookbook, “Using Pattern Matching in Match Expressions,” demonstrates more examples of using @ in match expressions.