Scala examples: Reading HOCON configuration files, and screen-scraping with JSoup and Sttp

If you’re interested in this sort of thing, I’ve been trying to fix the pages for my free Scala and functional programming video courses, and to that end I needed to do some things with reading HOCON configuration files, and screen-scraping of my own website pages.

Therefore, without much explanation, here is some source code that I wrote over the last day or two, with the help of an A.I. tool or two. One thing to note is that the quality of the code isn’t very good, because I let the A.I. tools generate most of it, and I didn’t bother to clean it up.

That being said, it works, so if you need to read HOCON configuration files or do some screen-scraping, I hope these are helpful.

Hocon3SpecifyFilename.scala

This is some Scala code I/we created to read multiple HOCON configuration files in one Scala application:

//> using scala "3"
//> using dep "com.typesafe:config:1.4.3"

import com.typesafe.config.{Config, ConfigFactory}
import scala.jdk.CollectionConverters.*
import java.io.File

@main def run(): Unit =

    println("\nEMAIL CONFIG:")
    val emailConfig = ConfigFactory.parseFile(File("email.conf"))
    val username = emailConfig.getString("myapp.username")
    val email = emailConfig.getString("myapp.email")
    println(username)
    println(email)

    println("\nJDBC CONFIG:")
    val jdbcConfig = ConfigFactory.parseFile(File("jdbc.conf"))
    val driver = jdbcConfig.getString("jdbc.driver")
    val url = jdbcConfig.getString("jdbc.url")
    val password = jdbcConfig.getString("jdbc.password")
    println(s"driver =   $driver")
    println(s"url =      $url")
    println(s"password = $password")

    // // The latter configs override the former ones if there are duplicate keys
    // val combinedConfig = config1
    //     .withFallback(config2)
    //     .withFallback(config3)
    //     .resolve()  // This resolves any substitutions in the config

    // // Now you can use the combinedConfig object to access your configuration
    // val videoSetting = combinedConfig.getString("video.quality")
    // val audioSetting = combinedConfig.getString("audio.format")
    //
    // println(s"Video quality: $videoSetting")
    // println(s"Audio format: $audioSetting")

Here are the test configuration files:

# email.conf

myapp {
    username = "alvin"
    email = "coyote@acme.com"
}

and:

# jdbc.conf

jdbc {
    driver  = "com.mysql.jdbc.Driver"
    url = "jdbc:mysql://127.0.0.1:8889/kbhr"
    username = "root"
    password = "root"
}

Main.scala

I used this Scala file to read the previous jdbc.conf HOCON file. Here’s the Scala source code:

//> using scala "3"
//> using dep "com.typesafe:config:1.4.3"

import com.typesafe.config.{Config, ConfigFactory}

/**
 * Run with this command:
 *     scala-cli run Main.scala '-Dconfig.file=videos.conf'
 */
@main def ReadConfigurationFile =

    // this DOES NOT WORK:
    // val config: Config = ConfigFactory.load("videos.conf")

    // DOES WORK when the config filename is specified
    // on the scala-cli command line as:
    // scala-cli run Main.scala '-Dconfig.file=videos.conf'
    val config: Config = ConfigFactory.load()

    val driver = config.getString("jdbc.driver")
    val url = config.getString("jdbc.url")
    val username = config.getString("jdbc.username")
    val password = config.getString("jdbc.password")

    println(s"driver =   $driver")
    println(s"url =      $url")
    println(s"username = $username")
    println(s"password = $password")

ReadHoconVideoConfFile.scala

I used this Scala code to read the videos.conf HOCON file that follows. Here’s the Scala source code:

//> using scala "3"
//> using dep "com.typesafe:config:1.4.3"

import com.typesafe.config.{Config, ConfigFactory}
import scala.jdk.CollectionConverters.*

case class Video(
    uri: String,
    name: String,
    description: String,
    thumbnailUrl: String,
    contentUrl: String,
    uploadDate: String
)

/**
 * Run with this command:
 *     scala-cli run ThisFile.scala
 */
@main def ReadVideosConfiguration: Unit =

    val config = ConfigFactory.parseFile(java.io.File("videos.conf"))
    val videosList = config.getConfigList("videos").asScala.toList

    val videos = videosList.map { videoConfig =>
        Video(
            uri = videoConfig.getString("uri"),
            name = videoConfig.getString("name"),
            description = videoConfig.getString("description"),
            thumbnailUrl = videoConfig.getString("thumbnailUrl"),
            contentUrl = videoConfig.getString("contentUrl"),
            uploadDate = videoConfig.getString("uploadDate")
        )
    }

    // videos.foreach { video =>
    //     println(s"Video: ${video.name}")
    //     println(s"    URI: ${video.uri}")
    //     println(s"    Description: ${video.description}")
    //     println(s"    Thumbnail: ${video.thumbnailUrl}")
    //     println(s"    Content: ${video.contentUrl}")
    //     println(s"    Upload Date: ${video.uploadDate}")
    //     println()
    // }
   
    videos.filter(v => v.uri == "/video-course/advanced-scala-3/higher-order-functions-2/").foreach(println)

and here’s the HOCON file:

# videos.conf

videos = [
{
    uri = "/video-course/advanced-scala-3/opaque-types/"
    name = "Scala Opaque Types (video)"
    description = "In this free Scala training video I discuss Opaque Types in Scala 3. Opaque types were introduced in Scala 3, and they let you create type aliases that restrict visibility to the underlying representa"
    thumbnailUrl = "https://alvinalexander.com/videos/courses/advanced-scala-3/advanced-scala-3-video-course.jpg"
    contentUrl = "https://alvinalexander.com/videos/courses/advanced-scala-3/027--Scala-Opaque-Types.mp4"
    uploadDate = "2024-08-17T15:56:41.264611-04:00"
},
{
    uri = "/video-course/functional-programming-2/a-small-zio-2-example-application/"
    name = "ZIO 2: A Small Application (video)"
    description = "NOTE: At the moment I don’t have a text description of this video, but I hope to add one at some point."
    thumbnailUrl = "https://alvinalexander.com/videos/courses/functional-programming-in-depth/advanced-functional-programming-video-course.jpg"
    contentUrl = "https://alvinalexander.com/videos/courses/functional-programming-in-depth/240-ZIO-zMakeInt-Example.mp4"
    uploadDate = "2024-08-17T15:56:41.264611-04:00"
}
]

ScrapeUrlData.scala (a first attempt at a screen-scraper to get information from my website)

I asked an A.I. tool to generate a screen-scraper for me that had a few requirements, and it generated this code that has no dependencies:

//> using scala "3"

import scala.io.Source
import scala.util.matching.Regex
import scala.util.{Try, Success, Failure}
import java.net.URL
import java.net.HttpURLConnection

object WebScraper:
    def getContent(url: String): Try[String] =
        Try:
            val connection = URL(url).openConnection().asInstanceOf[HttpURLConnection]
            connection.setRequestMethod("GET")
            val inputStream = connection.getInputStream
            val content = Source.fromInputStream(inputStream).mkString
            inputStream.close()
            content

    def extractVideoPath(content: String): Option[String] =
        val videoRegex: Regex = """<video.*?<source\s+src="([^"]+)"""".r
        videoRegex.findFirstMatchIn(content).map(_.group(1))

    def extractDescription(content: String): Option[String] =
        val descRegex: Regex = """<div class="video-course-notes">(.*?)</div>""".r
        descRegex.findFirstMatchIn(content).map: matchResult =>
            val fullDesc = matchResult.group(1).replaceAll("<.*?>", "").trim
            if fullDesc.length > 157 then fullDesc.take(157) + "..." else fullDesc

    def processUrl(url: String): Unit =
        val baseUrl = "https://alvinalexander.com"
        val uri = URL(url).getPath

        getContent(url) match
            case Success(content) =>
                val videoPath = extractVideoPath(content)
                val description = extractDescription(content)

                println(s"uri = \"$uri\"")
                videoPath.foreach(path => println(s"""contentUrl = "$path""""))
                description.foreach(desc => println(s"""description = "$desc""""))

            case Failure(exception) =>
                println(s"Failed to fetch content from $url: ${exception.getMessage}")

    def main(args: Array[String]): Unit =
        val urls = List(
            "https://alvinalexander.com/video-course/advanced-scala-3/opaque-types/",
            "https://alvinalexander.com/video-course/functional-programming-2/a-small-zio-2-example-application/",
            "https://alvinalexander.com/video-course/advanced-scala-3/higher-order-functions-2/"
        )

        urls.foreach: url =>
            processUrl(url)
            println() // Add a blank line between results

Scrape2.scala (a Scala application to get the video data off of my website, i.e., a screen-scraper)

I didn’t like that previous code, so I worked with another A.I. tool to generate Scala code that I was a little happier with.

//> using scala "3"
//> using dep "com.softwaremill.sttp.client3::core::3.8.0"
//> using dep "org.jsoup:jsoup:1.16.1"

import sttp.client3._
import org.jsoup.Jsoup
import org.jsoup.nodes.Document

import java.time.ZonedDateTime
import java.time.format.DateTimeFormatter

case class VideoDetails(
    uri: String,
    name: String,
    description: String,
    thumbnailUrl: String,
    contentUrl: String,
    uploadDate: String
)

object VideoScraper:

    val uploadDate: String = ZonedDateTime.now().format(DateTimeFormatter.ISO_OFFSET_DATE_TIME)

    def fetchVideoDetails(url: String): Option[VideoDetails] =
        val backend = HttpURLConnectionBackend()
        val response = basicRequest.get(uri"$url").send(backend)
       
        Thread.sleep(400)

        response.body match
            case Left(error) =>
                println(s"Failed to fetch URL $url: $error")
                None
            case Right(html) =>
                val doc: Document = Jsoup.parse(html)
                val titleOpt = Option(doc.select("title").text().split('|').headOption.getOrElse("").trim)
                val videoSrcOpt = Option(doc.select("video > source").attr("src"))
                val descriptionOpt = Option(doc.select("div.video-course-notes").text().replaceAll("\n", "").trim)

                for
                    title <- titleOpt
                    videoSrc <- videoSrcOpt
                    description <- descriptionOpt
                yield
                    val contentUrl = s"https://alvinalexander.com$videoSrc"
                    val thumbnailUrl = contentUrl.replace(".mp4", ".jpg")
                    val uri = url.stripPrefix("https://alvinalexander.com")
                    val newThumbnailUrl =
                        if uri.startsWith("/video-course/advanced-scala-3") then
                            "https://alvinalexander.com/videos/courses/advanced-scala-3/advanced-scala-3-video-course.jpg"
                        else if uri.startsWith("/video-course/intro-scala-3") then
                            "https://alvinalexander.com/videos/courses/intro-scala-3/introduction-to-scala-3-video-course.jpg"
                        else if uri.startsWith("/video-course/intro-fp") then
                            "https://alvinalexander.com/videos/courses/intro-fp/introduction-functional-programming-video-course.png"
                        else if uri.startsWith("/video-course/functional-programming-2") then
                            "https://alvinalexander.com/videos/courses/functional-programming-in-depth/advanced-functional-programming-video-course.jpg"
                        else
                            // "you should never come here" case
                            "https://alvinalexander.com/videos/courses/advanced-scala-3/advanced-scala-3-video-course.jpg"
                    VideoDetails(
                        uri,
                        name = title,
                        description = description.take(200),
                        thumbnailUrl = newThumbnailUrl,
                        contentUrl = contentUrl,
                        uploadDate = uploadDate
                    )

    def main(args: Array[String]): Unit =
        val urls = URLs.urls

        urls.foreach { url =>
            fetchVideoDetails(url) match
                case Some(details) =>
                    println(s"""
                        |{
                        |    uri = "${details.uri}"
                        |    name = "${details.name}"
                        |    description = "${details.description}"
                        |    thumbnailUrl = "${details.thumbnailUrl}"
                        |    contentUrl = "${details.contentUrl}"
                        |    uploadDate = "${details.uploadDate}"
                        |},
                    """.stripMargin.trim)
                case None => println(s"No video details found for URL: $url")
        }

object URLs:

    val urls = List(
        "https://alvinalexander.com/video-course/advanced-scala-3/opaque-types/",
        "https://alvinalexander.com/video-course/functional-programming-2/a-small-zio-2-example-application/",
        "https://alvinalexander.com/video-course/advanced-scala-3/higher-order-functions-2/",
        // more here ...
    )

As mentioned, some of that code DOES NOT represent best practices, and in fact, probably represents worst practices. But I was in a rush and was going to throw this code away when I was done, but thought I’d share it here in case it helps anyone else.