Problem: Your XML data has an array of elements, and you need to extract the first element, second element, or more generally, the Nth element, using Scala.
Solution
The following simplified version of the XML from the Yahoo Weather API has three <forecast>
elements:
val weather = <rss> <channel> <title>Yahoo! Weather - Boulder, CO</title> <item> <!-- multiple yweather:forecast elements --> <forecast day="Thu" date="10 Nov 2011" low="37" high="58" text="Partly Cloudy" code="29" /> <forecast day="Fri" date="11 Nov 2011" low="39" high="58" text="Mostly Cloudy" code="28" /> <forecast day="Sat" date="12 Nov 2011" low="32" high="49" text="Cloudy" code="27" /> </item> </channel> </rss>
To access the data in the first <forecast>
element, wrap the XPath expression in parentheses and append (0)
to it. You can access the first element using a series of \
method calls:
val day = (weather \ "channel" \ "item" \ "forecast")(0) \ "@day" val date = (weather \ "channel" \ "item" \ "forecast")(0) \ "@date"
Or you can access it with a single \\
method call, if you prefer:
val low = (weather \\ "forecast")(0) \ "@low" val high = (weather \\ "forecast")(0) \ "@high"
Either approach yields this result:
scala> val date = (weather \\ "forecast")(0) \ "@date" date: scala.xml.NodeSeq = 10 Nov 2011
Better yet, create a forecasts
object first, and then extract the attributes from it:
// 1) creates a NodeSeq with the three <forecast> elements val forecasts = weather \ "channel" \ "item" \ "forecast" // 2) extract the attributes val day = forecasts(0) \ "@day" // Thu (as a NodeSeq) val date = forecasts(0) \ "@date" // 10 Nov 2011 val low = forecasts(0) \ "@low" // 37 val high = forecasts(0) \ "@high" // 58 val text = forecasts(0) \ "@text" // Partly Cloudy
This approach returns the elements as a NodeSeq
:
scala> val day = forecasts(0) \ "@day" day: scala.xml.NodeSeq = Thu
To extract the attributes as a String
instead, add the text
method to the end of the expression:
scala> val day = (forecasts(0) \ "@day").text day: String = Thu
If the attribute doesn’t exist, this returns an empty string:
scala> val foo = ((weather \\ "forecast")(0) \ "@FOO").text foo: String = ""
You can access data from other <forecast>
elements in the same way. Here’s the date from the second element in the array:
scala> val date = ((weather \\ "forecast")(1) \ "@date").text date: String = 11 Nov 2011
As with any array you need to be careful, because if you try to access an array element that doesn’t exist, you’ll get an IndexOutOfBoundsException
:
scala> val date = ((weather \\ "forecast")(49) \ "@date").text java.lang.IndexOutOfBoundsException: 49
Iterating over the elements
If instead of accessing the <forecast>
nodes as individual array elements, you want to handle the same data in a loop, first grab all of the <forecast>
nodes using an XPath expression, and then iterate over them, as desired:
val forecastNodes = (weather \\ "forecast") forecastNodes.foreach{ n => val day = (n \ "@day").text val date = (n \ "@date").text val low = (n \ "@low").text println(s"$day, $date, Low: $low") }
This results in the following output:
Thu, 10 Nov 2011, Low: 37 Fri, 11 Nov 2011, Low: 39 Sat, 12 Nov 2011, Low: 32
Discussion
To explain this approach, it helps to see that when accessing array elements by their index value, the first portion of the search finds the <forecast>
elements, and returns them as a NodeSeq
:
scala> weather \\ "forecast" res0: scala.xml.NodeSeq = NodeSeq( <forecast high="58" low="37" day="Thu" code="29" date="10 Nov 2011" text="Partly Cloudy"></forecast>, <forecast high="58" low="39" day="Fri" code="28" date="11 Nov 2011" text="Mostly Cloudy"></forecast>, <forecast high="49" low="32" day="Sat" code="27" date="12 Nov 2011" text="Cloudy"></forecast>)
Enclosing the expression in parentheses and adding (0)
after it returns the zeroth element of the array:
scala> (weather \\ "forecast")(0) res1: scala.xml.Node = <forecast high="58" low="37" day="Thu" code="29" date="10 Nov 2011" text="Partly Cloudy"></forecast>
Each element in the NodeSeq
is an Elem
instance:
scala> (weather \\ "forecast")(0).getClass res0: Class[_ <: scala.xml.Node] = class scala.xml.Elem
Therefore, once you’re working with one <forecast>
element, you can access its tag attributes, such as the day
attribute:
scala> (weather \\ "forecast")(0) \ "@day" res1: scala.xml.NodeSeq = Thu
As with any Scala sequence, add (1)
, (2)
, etc. to access the other <forecast>
elements.
See Also
- The Yahoo Weather API: http://developer.yahoo.com/weather/