Problem: When writing a Scala application, you want to search an XML tree for the data you need using XPath expressions.
Solution
Use the \
and \\
methods, which are analogous to the XPath /
and //
expressions. The \
method returns all matching elements directly under the current node, and \\
returns all matching elements from all nodes under the current node (all descendant nodes).
To demonstrate this difference, create this XML literal:
scala> val a = <div><p>Hello,<br/>world</p></div> a: scala.xml.Elem = <div><p>Hello,<br/>world</p></div>
The \
method finds the <p>
tag because it’s directly under the <div>
tag:
scala> a \ "p" res0: scala.xml.NodeSeq = NodeSeq(<p>Hello,<br/>world</p>)
But it won’t find the <br/>
tag in the XML literal, because it’s not directly under the <div>
tag:
scala> a \ "br" res1: scala.xml.NodeSeq = NodeSeq()
However, the \\
method can find it, because it searches through all descendant nodes (children, grandchildren, etc.) under the <div>
tag:
scala> a \\ "br" res2: scala.xml.NodeSeq = NodeSeq(<br/>)
Using the \ method
As a deeper demonstration of the \
method, first create this XML literal:
val x = <stocks> <stock>AAPL</stock> <stock>AMZN</stock> <stock>GOOG</stock> </stocks>
Given this XML, the following expression returns all <stock>
elements:
scala> x \ "stock" res0: scala.xml.NodeSeq = NodeSeq(<stock>AAPL</stock>, <stock>AMZN</stock>, <stock>GOOG</stock>)
As shown in the REPL output, this returns an instance of a NodeSeq
, which is a simple wrapper around Seq[Node]
(a sequence of nodes). Like the Elem
class, NodeSeq
supports the \
and \\
search methods, as well as the usual large variety of collection methods.
Once you have a NodeSeq
, you can work with the data it contains. For instance, you can create a list of stock symbols like this:
scala> (x \ "stock").map(_.text) res1: scala.collection.immutable.Seq[String] = List(AAPL, AMZN, GOOG)
If this is confusing, it can help to see it broken down into smaller steps. First, get a sequence of elements with the \
method, and assign the result to a variable:
scala> val nodes = x \ "stock" nodes: scala.xml.NodeSeq = NodeSeq(<stock>AAPL</stock>, <stock>AMZN</stock>, <stock>GOOG</stock>)
You can see that nodes
is a variable of type NodeSeq
. Each individual node is of type Elem
:
scala> for (n <- nodes) println(n.getClass) class scala.xml.Elem class scala.xml.Elem class scala.xml.Elem
Each element contains its XML tag as well as its data:
scala> for (n <- nodes) println(n) <stock>AAPL</stock> <stock>AMZN</stock> <stock>GOOG</stock>
So, to extract only the data from each node, call the text
method:
scala> for (n <- nodes) println(n.text) AAPL AMZN GOOG
Putting this together, you can create a list of stock names using a for/yield loop:
scala> val stockNames = for (n <- nodes) yield n.text stockNames: scala.collection.immutable.Seq[String] = List(AAPL, AMZN, GOOG)
That loop is equivalent to this map
method call:
scala> val stockNames = nodes.map(_.text) stockNames: scala.collection.immutable.Seq[String] = List(AAPL, AMZN, GOOG)
Because NodeSeq
has all the usual sequence methods, it’s easy to get the information you want from the XML. For instance, you can find the number of nodes, or filter the results to get only the stocks you want:
// same as 'nodes.length' scala> (x \ "stock").length res1: Int = 3 scala> nodes.map(_.text).filter(_.startsWith("A")) res2: scala.collection.immutable.Seq[String] = List(AAPL, AMZN)
Using the \\ method
As mentioned, the \
method only returns matches on immediate subelements; to search deeper -- the entire XML tree -- use the \\
method.
Given this XML:
val x = <portfolio> <stocks> <stock>AAPL</stock> <stock>AMZN</stock> <stock>GOOG</stock> </stocks> <reits> <reit>Super REIT 1</reit> </reits> </portfolio>
the \
method returns an empty NodeSeq
when searching for <stock>
elements:
scala> x \ "stock" res0: scala.xml.NodeSeq = NodeSeq()
You can solve this problem by exactly specifying the path to the <stock>
elements with multiple \
method calls:
scala> x \ "stocks" \ "stock" res1: scala.xml.NodeSeq = NodeSeq(<stock>AAPL</stock>, <stock>AMZN</stock>, <stock>GOOG</stock>)
But the \\
method can be a simpler approach to finding the desired elements. It searches the entire XML tree to find all elements that match your search query:
scala> (x \\ "stock").foreach(println) <stock>AAPL</stock> <stock>AMZN</stock> <stock>GOOG</stock>
As shown before, you can convert the XML data to a list of strings, if desired:
scala> (x \\ "stock").map(_.text) res2: scala.collection.immutable.Seq[String] = List(AAPL, AMZN, GOOG)
Discussion
In addition to the approaches shown, you can also use the _
wildcard with the \
and \\
methods. For instance, given this XML that represents a group of people you know:
val people = <people> <family> <person>Mom</person> </family> <friends> <person>Bill</person> <person>Candy</person> </friends> </people>
You can list family members like this:
scala> val family = people \ "family" \ "person" family: scala.xml.NodeSeq = NodeSeq(<person>Mom</person>)
You can list friends like this:
scala> val friends = people \ "friends" \ "person" friends: scala.xml.NodeSeq = NodeSeq(<person>Bill</person>, <person>Candy</person>)
You can list everyone you know by using the _
wildcard in place of specifying family or friends:
scala> val allPeople = people \ "_" \ "person" allPeople: scala.xml.NodeSeq = NodeSeq(<person>Mom</person>, <person>Bill</person>, <person>Candy</person>)
Without the wildcard character, you’d have to create the lists of family and friends and then merge them together manually.
Once you have the list of people, you can access the elements one at a time:
scala> allPeople(0) res0: scala.xml.Node = <person>Mom</person> scala> allPeople(1) res1: scala.xml.Node = <person>Bill</person>
You can also iterate over all of the elements as usual:
scala> allPeople.foreach(println) <person>Mom</person> <person>Bill</person> <person>Candy</person> scala> for (person <- allPeople) println(person.text) Mom Bill Candy scala> allPeople.map(_.text) res2: scala.collection.immutable.Seq[String] = List(Mom, Bill, Candy)
See Also
- The Scala
Elem
class: http://www.scala-lang.org/api/current/scala/xml/Elem.html - The Scala
NodeSeq
class: http://www.scala-lang.org/api/current/scala/xml/NodeSeq.html