How to write an HTTP GET request client in Scala (with a timeout)

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 15.9, “How to write a simple HTTP GET request client in Scala.”

Problem

You want a Scala HTTP client you can use to make GET request calls.

Solution

There are many potential solutions to this problem. This recipe demonstrates three approaches:

  • A simple use of the scala.io.Source.fromURL method
  • Adding a timeout wrapper around scala.io.Source.fromURL to make it more robust
  • Using the Apache HttpClient library

These solutions are demonstrated in the following sections.

A simple use of scala.io.Source.fromURL

If it doesn’t matter that your web service client won’t time out in a controlled manner, you can use this simple method to download the contents from a URL:

/**
 * Returns the text (content) from a URL as a String.
 * Warning: This method does not time out when the service is non-responsive.
 */
def get(url: String): String = scala.io.Source.fromURL(url).mkString

This GET request method lets you call the given RESTful URL to retrieve its content. You can use it to download web pages, RSS feeds, or any other content using an HTTP GET request.

Under the covers, the Source.fromURL method uses classes like java.net.URL and java.io.InputStream, so this method can throw exceptions that extend from java.io.IOException. As a result, you may want to annotate your method to indicate that:

@throws(classOf[java.io.IOException])
def get(url: String): String = io.Source.fromURL(url).mkString

Setting the timeout while using scala.io.Source.fromURL

As mentioned, that simple solution suffers from a significant problem: it doesn’t time out if the URL you’re calling is unresponsive. If the web service you’re calling isn’t responding, your code will become unresponsive at this point as well.

Therefore, a better approach is to write a similar method that allows the setting of a timeout value. By using a combination of java.net classes and the io.Source.fromInputStream method, you can create a more robust method that lets you control both the connection and read timeout values:

/**
  * Returns the text (content) from a REST URL as a String.
  * Inspired by http://matthewkwong.blogspot.com/2009/09/scala-scalaiosource-fromurl-blockshangs.html
  * and http://alvinalexander.com/blog/post/java/how-open-url-read-contents-httpurl-connection-java
  *
  * The `connectTimeout` and `readTimeout` comes from the Java URLConnection
  * class Javadoc.
  * @param url The full URL to connect to.
  * @param connectTimeout Sets a specified timeout value, in milliseconds,
  * to be used when opening a communications link to the resource referenced
  * by this URLConnection. If the timeout expires before the connection can
  * be established, a java.net.SocketTimeoutException
  * is raised. A timeout of zero is interpreted as an infinite timeout.
  * Defaults to 5000 ms.
  * @param readTimeout If the timeout expires before there is data available
  * for read, a java.net.SocketTimeoutException is raised. A timeout of zero
  * is interpreted as an infinite timeout. Defaults to 5000 ms.
  * @param requestMethod Defaults to "GET". (Other methods have not been tested.)
  *
  * @example get("http://www.example.com/getInfo")
  * @example get("http://www.example.com/getInfo", 5000)
  * @example get("http://www.example.com/getInfo", 5000, 5000)
  */
@throws(classOf[java.io.IOException])
@throws(classOf[java.net.SocketTimeoutException])
def get(url: String,
        connectTimeout: Int = 5000,
        readTimeout: Int = 5000,
        requestMethod: String = "GET") =
{
    import java.net.{URL, HttpURLConnection}
    val connection = (new URL(url)).openConnection.asInstanceOf[HttpURLConnection]
    connection.setConnectTimeout(connectTimeout)
    connection.setReadTimeout(readTimeout)
    connection.setRequestMethod(requestMethod)
    val inputStream = connection.getInputStream
    val content = io.Source.fromInputStream(inputStream).mkString
    if (inputStream != null) inputStream.close
    content
}

As the Scaladoc shows, this method can be called in a variety of ways, including this:

try {
    val content = get("http://localhost:8080/waitForever")
    println(content)
} catch {
    case ioe: java.io.IOException =>  // handle this
    case ste: java.net.SocketTimeoutException => // handle this
}

I haven’t tested this method with other request types, such as PUT or DELETE, but I have allowed for this possibility by making the requestMethod an optional parameter.

Using the Apache HttpClient

Another approach you can take is to use the Apache HttpClient library. Before I learned about the previous approaches, I wrote a getRestContent method using this library like this:

import java.io._
import org.apache.http.{HttpEntity, HttpResponse}
import org.apache.http.client._
import org.apache.http.client.methods.HttpGet
import org.apache.http.impl.client.DefaultHttpClient
import scala.collection.mutable.StringBuilder
import scala.xml.XML
import org.apache.http.params.HttpConnectionParams
import org.apache.http.params.HttpParams

/**
  * Returns the text (content) from a REST URL as a String.
  * Returns a blank String if there was a problem.
  * This function will also throw exceptions if there are problems trying
  * to connect to the url.
  *
  * @param url A complete URL, such as "http://foo.com/bar"
  * @param connectionTimeout The connection timeout, in ms.
  * @param socketTimeout The socket timeout, in ms.
  */
def getRestContent(url: String,
                   connectionTimeout: Int,
                   socketTimeout: Int): String = {
    val httpClient = buildHttpClient(connectionTimeout, socketTimeout)
    val httpResponse = httpClient.execute(new HttpGet(url))
    val entity = httpResponse.getEntity
    var content = ""
    if (entity != null) {
        val inputStream = entity.getContent
        content = io.Source.fromInputStream(inputStream).getLines.mkString
        inputStream.close
    }
    httpClient.getConnectionManager.shutdown
    content
}

private def buildHttpClient(connectionTimeout: Int, socketTimeout: Int):

DefaultHttpClient = {
    val httpClient = new DefaultHttpClient
    val httpParams = httpClient.getParams
    HttpConnectionParams.setConnectionTimeout(httpParams, connectionTimeout)
    HttpConnectionParams.setSoTimeout(httpParams, socketTimeout)
    httpClient.setParams(httpParams)
    httpClient
}

This requires significantly more code than the Source.fromURL approaches, as well as the HttpClient library. If you’re already using the Apache HttpClient library for other purposes, this is a viable alternative. As shown in Recipes 15.12 and 15.13, the HttpClient library definitely has advantages in situations such as working with request headers.

Discussion

There are several other approaches you can take to handle this timeout problem. One is to use the Akka Futures as a wrapper around the Source.fromURL method call. See Recipe 13.9, “Simple Concurrency with Futures” for an example of how to use that approach.

Also, new libraries are always being released. A library named Newman was released by StackMob as this book was in the production process, and looks promising. The Newman DSL was inspired by the Dispatch library, but uses method names instead of symbols, and appears to be easier to use as a result. It also provides separate methods for the GET, POST, PUT, DELETE, and HEAD request methods.

See Also

  • Source.fromURL timeout approach (matthewkwong.blogspot.com)
  • If you prefer asynchronous programming, you can mix this recipe with Scala Futures, which are demonstrated in Chapter 13. Another option is the Dispatch library. As its documentation states, “Dispatch is a library for asynchronous HTTP interaction. It provides a Scala vocabulary for Java’s async-http-client.”