Scala: Deeper XML parsing, and extracting XML tag attributes

Problem: You need to perform deep XML searches in Scala, combining the \ and \\ methods, and possibly searching directly for tag attributes.

Solution

Combine the \\ and \ methods as needed to search the XML. When you need to extract tag attributes, place an @ character before the attribute name.

Given this simplified version of the Yahoo Weather RSS Feed:

val weather = 
<rss>
  <channel>
    <title>Yahoo! Weather - Boulder, CO</title>
    <item>
     <title>Conditions for Boulder, CO at 2:54 pm MST</title>
     <forecast day="Thu" date="10 Nov 2011" low="37" high="58" text="Partly Cloudy"   
               code="29" />
    </item>
  </channel>
</rss>

you can access the <forecast> tag like this:

scala> val forecast = weather \ "channel" \ "item" \ "forecast"
forecast: scala.xml.NodeSeq = NodeSeq(<forecast day="Thu" 
  date="10 Nov 2011" low="37" high="58" text="Partly Cloudy" code="29"/>)

You can also directly access the attributes of the <forecast> element with these expressions:

val day = weather \ "channel" \ "item" \ "forecast" \ "@day"
val date = weather \ "channel" \ "item" \ "forecast" \ "@date"

However, once you’ve created the forecast variable, it’s easier to access the attributes like this:

val day = forecast \ "@day"     // Thu
val date = forecast \ "@date"   // 10 Nov 2011
val low = forecast \ "@low"     // 37
val high = forecast \ "@high"   // 58
val text = forecast \ "@text"   // Partly Cloudy

Each of these attributes is returned as a NodeSeq:

scala> val day = forecast \ "@day"
day: scala.xml.NodeSeq = Thu

You can convert that to a String with the text method:

scala> val day = (forecast \ "@day").text
day: String = Thu

A nice feature of this approach is that if an attribute is missing, it kindly returns an empty NodeSeq:

scala> val foo = forecast \ "@foo"
foo: scala.xml.NodeSeq = NodeSeq()

This makes it easy to iterate over the results when elements are found, and when they’re not found.

I created the forecast variable by specifying the full path to the <forecast> tag attributes, but you can simplify the expression by using \\ instead of \:

scala> val day = weather \\ "forecast" \ "@day"
day: scala.xml.NodeSeq = Thu

If you’re comfortable with your data―for instance, if you know there is only one day attribute that can be found―you can shorten that expression to only this:

scala> val day = weather \\ "@day"
day: scala.xml.NodeSeq = NodeSeq(Thu)

Discussion

To demonstrate more XPath search expressions, create this XML literal:

val xml = 
  <order>
    <item name="Pizza" price="12.00">
      <pizza>
        <crust type="thin" size="14" />
        <topping>cheese</topping>
        <topping>sausage</topping>
      </pizza>
    </item>
    <item name="Breadsticks" price="4.00">
      <breadsticks />
    </item>
    <tax type="federal">0.80</tax>
    <tax type="state">0.80</tax>
    <tax type="local">0.40</tax>
  </order>

The following examples, which combine XPath and Scala expressions, show how to extract different pieces of information from that XML, including the elements and attributes. The comments before each expression state what I’m looking for:

// get the <item> elements from the order
scala> val items = xml \ "item"
items: scala.xml.NodeSeq = 
NodeSeq(<item name="Pizza" price="12.00">
      <pizza>
        <crust type="thin" size="14"/>
        <topping>cheese</topping>
        <topping>sausage</topping>
      </pizza>
    </item>, <item name="Breadsticks" price="4.00">
      <breadsticks/>
    </item>)

// number of items in the order
scala> val numItems = items.length
numItems: Int = 2

// list of item prices
scala> val prices = items.map(i => i \ "@price")
prices: scala.collection.immutable.Seq[scala.xml.NodeSeq] = List(12.00, 4.00)

// the subtotal price
scala> val subtotal = items.map(i => (i \ "@price").text.toDouble).sum
subtotal: Double = 16.0

// list of taxes
scala> val taxItems = xml \ "tax"
taxItems: scala.xml.NodeSeq = NodeSeq(<tax type="federal">0.80</tax>,
<tax type="state">0.80</tax>, <tax type="local">0.40</tax>)

// the total tax
scala> val totalTax = taxItems.map(i => i.text.toDouble).sum
totalTax: Double = 2.0

// list of toppings on the pizza
scala> val toppings = (item \ "pizza" \ "topping").map(_.text)
toppings: scala.collection.immutable.Seq[String] = List(cheese, sausage)

You can access individual tax items like this:

// get the federal tax
val federalTax = for {
  item <- taxItems
  if (item \ "@type").text == "federal"
} yield item.text

That code returns a List(0.80), a List[String], which you can convert to a numeric value as shown in the examples.