Problem: You need to perform deep XML searches in Scala, combining the \
and \\
methods, and possibly searching directly for tag attributes.
Solution
Combine the \\
and \
methods as needed to search the XML. When you need to extract tag attributes, place an @
character before the attribute name.
Given this simplified version of the Yahoo Weather RSS Feed:
val weather = <rss> <channel> <title>Yahoo! Weather - Boulder, CO</title> <item> <title>Conditions for Boulder, CO at 2:54 pm MST</title> <forecast day="Thu" date="10 Nov 2011" low="37" high="58" text="Partly Cloudy" code="29" /> </item> </channel> </rss>
you can access the <forecast>
tag like this:
scala> val forecast = weather \ "channel" \ "item" \ "forecast" forecast: scala.xml.NodeSeq = NodeSeq(<forecast day="Thu" date="10 Nov 2011" low="37" high="58" text="Partly Cloudy" code="29"/>)
You can also directly access the attributes of the <forecast>
element with these expressions:
val day = weather \ "channel" \ "item" \ "forecast" \ "@day" val date = weather \ "channel" \ "item" \ "forecast" \ "@date"
However, once you’ve created the forecast
variable, it’s easier to access the attributes like this:
val day = forecast \ "@day" // Thu val date = forecast \ "@date" // 10 Nov 2011 val low = forecast \ "@low" // 37 val high = forecast \ "@high" // 58 val text = forecast \ "@text" // Partly Cloudy
Each of these attributes is returned as a NodeSeq
:
scala> val day = forecast \ "@day" day: scala.xml.NodeSeq = Thu
You can convert that to a String
with the text
method:
scala> val day = (forecast \ "@day").text day: String = Thu
A nice feature of this approach is that if an attribute is missing, it kindly returns an empty NodeSeq
:
scala> val foo = forecast \ "@foo" foo: scala.xml.NodeSeq = NodeSeq()
This makes it easy to iterate over the results when elements are found, and when they’re not found.
I created the forecast variable by specifying the full path to the <forecast>
tag attributes, but you can simplify the expression by using \\
instead of \
:
scala> val day = weather \\ "forecast" \ "@day" day: scala.xml.NodeSeq = Thu
If you’re comfortable with your data―for instance, if you know there is only one day attribute that can be found―you can shorten that expression to only this:
scala> val day = weather \\ "@day" day: scala.xml.NodeSeq = NodeSeq(Thu)
Discussion
To demonstrate more XPath search expressions, create this XML literal:
val xml = <order> <item name="Pizza" price="12.00"> <pizza> <crust type="thin" size="14" /> <topping>cheese</topping> <topping>sausage</topping> </pizza> </item> <item name="Breadsticks" price="4.00"> <breadsticks /> </item> <tax type="federal">0.80</tax> <tax type="state">0.80</tax> <tax type="local">0.40</tax> </order>
The following examples, which combine XPath and Scala expressions, show how to extract different pieces of information from that XML, including the elements and attributes. The comments before each expression state what I’m looking for:
// get the <item> elements from the order scala> val items = xml \ "item" items: scala.xml.NodeSeq = NodeSeq(<item name="Pizza" price="12.00"> <pizza> <crust type="thin" size="14"/> <topping>cheese</topping> <topping>sausage</topping> </pizza> </item>, <item name="Breadsticks" price="4.00"> <breadsticks/> </item>) // number of items in the order scala> val numItems = items.length numItems: Int = 2 // list of item prices scala> val prices = items.map(i => i \ "@price") prices: scala.collection.immutable.Seq[scala.xml.NodeSeq] = List(12.00, 4.00) // the subtotal price scala> val subtotal = items.map(i => (i \ "@price").text.toDouble).sum subtotal: Double = 16.0 // list of taxes scala> val taxItems = xml \ "tax" taxItems: scala.xml.NodeSeq = NodeSeq(<tax type="federal">0.80</tax>, <tax type="state">0.80</tax>, <tax type="local">0.40</tax>) // the total tax scala> val totalTax = taxItems.map(i => i.text.toDouble).sum totalTax: Double = 2.0 // list of toppings on the pizza scala> val toppings = (item \ "pizza" \ "topping").map(_.text) toppings: scala.collection.immutable.Seq[String] = List(cheese, sausage)
You can access individual tax items like this:
// get the federal tax val federalTax = for { item <- taxItems if (item \ "@type").text == "federal" } yield item.text
That code returns a List(0.80)
, a List[String]
, which you can convert to a numeric value as shown in the examples.