Scala XML: Searching XMLNS namespaces, XPath, and more

In my earlier tutorial on creating a Scala REST client using the Apache HttpClient library, I demonstrated how to download the contents of a Yahoo Weather API URL, and then parse those contents using the Scala XML library. I didn't discuss the XML searching/parsing process used in that source code, so in this article I'll take a few moments to look at that code.

First, here's a revised version of that earlier source code, this time showing how to load an XML file from disk using Scala:

import java.io._
import scala.xml.XML

object ScalaApacheHttpRestClient1 {

  def main(args: Array[String]) {

    // get the xml content from our sample file
    val xml = XML.loadFile("/Users/al/Projects/Scala/yahoo-weather.xml")

    // find what i want
    val temp = (xml \\ "channel" \\ "item" \ "condition" \ "@temp") text
    val text = (xml \\ "channel" \\ "item" \ "condition" \ "@text") text

    val currentWeather = format("The current temperature is %s degrees, and the sky is %s.", temp, text.toLowerCase())
    println(currentWeather)

  }
}

As you can see, I'm loading a file from disk named yahoo-weather.xml, and I'm only concerned about the "temp" and "text" nodes within that XML document.

I created this file by saving the contents from the Yahoo Weather API URL. That way I wouldn't have to keep hitting their URL during my tests. Here are the contents of that file:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>    
<rss version="2.0" xmlns:yweather="http://xml.weather.yahoo.com/ns/rss/1.0" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#">
<channel>
<title>Yahoo! Weather - Broomfield, CO</title>
<link>http://us.rd.yahoo.com/dailynews/rss/weather/Broomfield__CO/*http://weather.yahoo.com/forecast/USCO0044_f.html</link>
<description>Yahoo! Weather for Broomfield, CO</description>
<language>en-us</language>
<lastBuildDate>Thu, 10 Nov 2011 2:54 pm MST</lastBuildDate>
<ttl>60</ttl><yweather:location city="Broomfield" region="CO"   country="US"/>
<yweather:units temperature="F" distance="mi" pressure="in" speed="mph"/>
<yweather:wind chill="61"   direction="300"   speed="5" />
<yweather:atmosphere humidity="3"  visibility=""  pressure="30.11"  rising="2" />
<yweather:astronomy sunrise="6:39 am"   sunset="4:47 pm"/>
<image>
<title>Yahoo! Weather</title><width>142</width><height>18</height>
<link>http://weather.yahoo.com</link>
<url>http://l.yimg.com/a/i/brand/purplelogo//uh/us/news-wea.gif</url>
</image>
<item>
<title>Conditions for Broomfield, CO at 2:54 pm MST</title>
<geo:lat>39.92</geo:lat>
<geo:long>-105.09</geo:long>
<link>http://us.rd.yahoo.com/dailynews/rss/weather/Broomfield__CO/*http://weather.yahoo.com/forecast/USCO0044_f.html</link>
<pubDate>Thu, 10 Nov 2011 2:54 pm MST</pubDate>
<yweather:condition  text="Partly Cloudy"  code="30"  temp="61"  date="Thu, 10 Nov 2011 2:54 pm MST" />

<description>
<![CDATA[<img src="http://l.yimg.com/a/i/us/we/52/30.gif"/><br />
<b>Current Conditions:</b><br />Partly Cloudy, 61 F<BR />
<BR /><b>Forecast:</b>
<BR />Thu - Partly Cloudy. High: 58 Low: 37
<br />Fri - Mostly Cloudy. High: 58 Low: 39<br />
<br /><a href="http://us.rd.yahoo.com/dailynews/rss/weather/Broomfield__CO/*http://weather.yahoo.com/forecast/USCO0044_f.html">Full Forecast at Yahoo! Weather</a><BR/>
<BR/>(provided by <a href="http://www.weather.com" >The Weather Channel</a>)<br/>]]></description>

<yweather:forecast day="Thu" date="10 Nov 2011" low="37" high="58" text="Partly Cloudy" code="29" />
<yweather:forecast day="Fri" date="11 Nov 2011" low="39" high="58" text="Mostly Cloudy" code="28" />
<guid isPermaLink="false">USCO0044_2011_11_11_7_00_MST</guid>
</item>
</channel>
</rss>
<!-- api1.weather.sp2.yahoo.com uncompressed/chunked Thu Nov 10 14:23:09 PST 2011 -->

Searching XML using the Scala XML library

Cutting this down to just the basics, by looking at the Yahoo Weather XML document, I know I want to retrieve the "temp" attribute of the yweather:condition node:

<yweather:condition text="Partly Cloudy" 
                    code="30" 
                    temp="61" 
                    date="Thu, 10 Nov 2011 2:54 pm MST" />

To search this XML document in Scala, I write the following line of code, using the Scala XPath syntax:

val temp = (xml \\ "channel" \\ "item" \ "condition" \ "@temp") text

This code can be read as "Search the variable named 'xml' for the XML node attribute named 'temp', where that attribute is within a node named 'condition', which is a child of the 'item' tag, which itself is a child of the 'channel' tag."

In that XPath search pattern I'm being very explicit about what I'm searching for, specifying all of the parent and child nodes in the search path. Knowing what the XML document looks like, I could simplify that search path to just look like this, skipping the "channel" and "item" elements in the search path:

val temp = (xml \\ "condition" \ "@temp") text

I'm not an XML parsing wizard, so I'll leave the decision of which XPath expression you want to use up to you. Either way, when this expression is run, the Scala variable named "temp" will have the value "61" after this line of code is run.

The Scala XML \\, \, and @ operators

A couple of notes about the Scala XML \, \\, and @ operators:

  • The \ operator doesn't descend into child elements.
  • Therefore you have to use the \\ operator to find child elements like "item" and "condition" above.
  • You use the "@" character to search for XML tag attributes, such as the "temp" attribute of the yweather:condition tag.
  • \ and \\ are called "projection functions", and they return a NodeSeq object.
  • The \ and \\ operators (functions) are based on XPath operators, but Scala uses backslashes instead of forward-slashes because forward-slashes are already used for math operations.

Scala, XMLNS namespaces, and attributes

As I just wrote, you use the "@" character to search for XML tag attributes. However, I skipped over the XMLNS namespace issue which caused me so much grief. In short, if you have an XML tag like this which uses an XMLNS namespace which has been properly declared at the beginning of the XML document:

<yweather:condition text="Partly Cloudy" code="30" temp="61" date="Thu, 10 Nov 2011 2:54 pm MST" />

you can ignore the "yweather" portion of the tag, and just access the node (and in this case, the desired node attribute) like this:

val temp = (xml \\ "condition" \ "@temp") text

Scala XML parsing, searching, XMLNS namespaces, and XPath - Summary

I ended up covering a lot more ground in this Scala XML tutorial than I planned to, but I hope it has been helpful. In the end, this tutorial covered all of these topics:

  • How to load an XML document from a file in Scala.
  • How to search XML documents using the Scala XPath syntax.
  • Discussed the difference between the \ and \\ XPath operators.
  • How to deal with XMLNS namespaces in Scala.
  • How to find XML node attributes.

In summary, I hope this article has been helpful. I know the Scala code looks pretty simple, but finding this solution caused me a surprising amount of grief, particularly dealing with the XMLNS namespace in Scala.