If you’re interested in this sort of thing, I’ve been trying to fix the pages for my free Scala and functional programming video courses, and to that end I needed to do some things with reading HOCON configuration files, and screen-scraping of my own website pages.
Therefore, without much explanation, here is some source code that I wrote over the last day or two, with the help of an A.I. tool or two. One thing to note is that the quality of the code isn’t very good, because I let the A.I. tools generate most of it, and I didn’t bother to clean it up.
That being said, it works, so if you need to read HOCON configuration files or do some screen-scraping, I hope these are helpful.
Hocon3SpecifyFilename.scala
This is some Scala code I/we created to read multiple HOCON configuration files in one Scala application:
//> using scala "3"
//> using dep "com.typesafe:config:1.4.3"
import com.typesafe.config.{Config, ConfigFactory}
import scala.jdk.CollectionConverters.*
import java.io.File
@main def run(): Unit =
println("\nEMAIL CONFIG:")
val emailConfig = ConfigFactory.parseFile(File("email.conf"))
val username = emailConfig.getString("myapp.username")
val email = emailConfig.getString("myapp.email")
println(username)
println(email)
println("\nJDBC CONFIG:")
val jdbcConfig = ConfigFactory.parseFile(File("jdbc.conf"))
val driver = jdbcConfig.getString("jdbc.driver")
val url = jdbcConfig.getString("jdbc.url")
val password = jdbcConfig.getString("jdbc.password")
println(s"driver = $driver")
println(s"url = $url")
println(s"password = $password")
// // The latter configs override the former ones if there are duplicate keys
// val combinedConfig = config1
// .withFallback(config2)
// .withFallback(config3)
// .resolve() // This resolves any substitutions in the config
// // Now you can use the combinedConfig object to access your configuration
// val videoSetting = combinedConfig.getString("video.quality")
// val audioSetting = combinedConfig.getString("audio.format")
//
// println(s"Video quality: $videoSetting")
// println(s"Audio format: $audioSetting")
Here are the test configuration files:
# email.conf
myapp {
username = "alvin"
email = "coyote@acme.com"
}
and:
# jdbc.conf
jdbc {
driver = "com.mysql.jdbc.Driver"
url = "jdbc:mysql://127.0.0.1:8889/kbhr"
username = "root"
password = "root"
}
Main.scala
I used this Scala file to read the previous jdbc.conf HOCON file. Here’s the Scala source code:
//> using scala "3"
//> using dep "com.typesafe:config:1.4.3"
import com.typesafe.config.{Config, ConfigFactory}
/**
* Run with this command:
* scala-cli run Main.scala '-Dconfig.file=videos.conf'
*/
@main def ReadConfigurationFile =
// this DOES NOT WORK:
// val config: Config = ConfigFactory.load("videos.conf")
// DOES WORK when the config filename is specified
// on the scala-cli command line as:
// scala-cli run Main.scala '-Dconfig.file=videos.conf'
val config: Config = ConfigFactory.load()
val driver = config.getString("jdbc.driver")
val url = config.getString("jdbc.url")
val username = config.getString("jdbc.username")
val password = config.getString("jdbc.password")
println(s"driver = $driver")
println(s"url = $url")
println(s"username = $username")
println(s"password = $password")
ReadHoconVideoConfFile.scala
I used this Scala code to read the videos.conf HOCON file that follows. Here’s the Scala source code:
//> using scala "3"
//> using dep "com.typesafe:config:1.4.3"
import com.typesafe.config.{Config, ConfigFactory}
import scala.jdk.CollectionConverters.*
case class Video(
uri: String,
name: String,
description: String,
thumbnailUrl: String,
contentUrl: String,
uploadDate: String
)
/**
* Run with this command:
* scala-cli run ThisFile.scala
*/
@main def ReadVideosConfiguration: Unit =
val config = ConfigFactory.parseFile(java.io.File("videos.conf"))
val videosList = config.getConfigList("videos").asScala.toList
val videos = videosList.map { videoConfig =>
Video(
uri = videoConfig.getString("uri"),
name = videoConfig.getString("name"),
description = videoConfig.getString("description"),
thumbnailUrl = videoConfig.getString("thumbnailUrl"),
contentUrl = videoConfig.getString("contentUrl"),
uploadDate = videoConfig.getString("uploadDate")
)
}
// videos.foreach { video =>
// println(s"Video: ${video.name}")
// println(s" URI: ${video.uri}")
// println(s" Description: ${video.description}")
// println(s" Thumbnail: ${video.thumbnailUrl}")
// println(s" Content: ${video.contentUrl}")
// println(s" Upload Date: ${video.uploadDate}")
// println()
// }
videos.filter(v => v.uri == "/video-course/advanced-scala-3/higher-order-functions-2/").foreach(println)
and here’s the HOCON file:
# videos.conf
videos = [
{
uri = "/video-course/advanced-scala-3/opaque-types/"
name = "Scala Opaque Types (video)"
description = "In this free Scala training video I discuss Opaque Types in Scala 3. Opaque types were introduced in Scala 3, and they let you create type aliases that restrict visibility to the underlying representa"
thumbnailUrl = "https://alvinalexander.com/videos/courses/advanced-scala-3/advanced-scala-3-video-course.jpg"
contentUrl = "https://alvinalexander.com/videos/courses/advanced-scala-3/027--Scala-Opaque-Types.mp4"
uploadDate = "2024-08-17T15:56:41.264611-04:00"
},
{
uri = "/video-course/functional-programming-2/a-small-zio-2-example-application/"
name = "ZIO 2: A Small Application (video)"
description = "NOTE: At the moment I don’t have a text description of this video, but I hope to add one at some point."
thumbnailUrl = "https://alvinalexander.com/videos/courses/functional-programming-in-depth/advanced-functional-programming-video-course.jpg"
contentUrl = "https://alvinalexander.com/videos/courses/functional-programming-in-depth/240-ZIO-zMakeInt-Example.mp4"
uploadDate = "2024-08-17T15:56:41.264611-04:00"
}
]
ScrapeUrlData.scala (a first attempt at a screen-scraper to get information from my website)
I asked an A.I. tool to generate a screen-scraper for me that had a few requirements, and it generated this code that has no dependencies:
//> using scala "3"
import scala.io.Source
import scala.util.matching.Regex
import scala.util.{Try, Success, Failure}
import java.net.URL
import java.net.HttpURLConnection
object WebScraper:
def getContent(url: String): Try[String] =
Try:
val connection = URL(url).openConnection().asInstanceOf[HttpURLConnection]
connection.setRequestMethod("GET")
val inputStream = connection.getInputStream
val content = Source.fromInputStream(inputStream).mkString
inputStream.close()
content
def extractVideoPath(content: String): Option[String] =
val videoRegex: Regex = """<video.*?<source\s+src="([^"]+)"""".r
videoRegex.findFirstMatchIn(content).map(_.group(1))
def extractDescription(content: String): Option[String] =
val descRegex: Regex = """<div class="video-course-notes">(.*?)</div>""".r
descRegex.findFirstMatchIn(content).map: matchResult =>
val fullDesc = matchResult.group(1).replaceAll("<.*?>", "").trim
if fullDesc.length > 157 then fullDesc.take(157) + "..." else fullDesc
def processUrl(url: String): Unit =
val baseUrl = "https://alvinalexander.com"
val uri = URL(url).getPath
getContent(url) match
case Success(content) =>
val videoPath = extractVideoPath(content)
val description = extractDescription(content)
println(s"uri = \"$uri\"")
videoPath.foreach(path => println(s"""contentUrl = "$path""""))
description.foreach(desc => println(s"""description = "$desc""""))
case Failure(exception) =>
println(s"Failed to fetch content from $url: ${exception.getMessage}")
def main(args: Array[String]): Unit =
val urls = List(
"https://alvinalexander.com/video-course/advanced-scala-3/opaque-types/",
"https://alvinalexander.com/video-course/functional-programming-2/a-small-zio-2-example-application/",
"https://alvinalexander.com/video-course/advanced-scala-3/higher-order-functions-2/"
)
urls.foreach: url =>
processUrl(url)
println() // Add a blank line between results
Scrape2.scala (a Scala application to get the video data off of my website, i.e., a screen-scraper)
I didn’t like that previous code, so I worked with another A.I. tool to generate Scala code that I was a little happier with.
//> using scala "3"
//> using dep "com.softwaremill.sttp.client3::core::3.8.0"
//> using dep "org.jsoup:jsoup:1.16.1"
import sttp.client3._
import org.jsoup.Jsoup
import org.jsoup.nodes.Document
import java.time.ZonedDateTime
import java.time.format.DateTimeFormatter
case class VideoDetails(
uri: String,
name: String,
description: String,
thumbnailUrl: String,
contentUrl: String,
uploadDate: String
)
object VideoScraper:
val uploadDate: String = ZonedDateTime.now().format(DateTimeFormatter.ISO_OFFSET_DATE_TIME)
def fetchVideoDetails(url: String): Option[VideoDetails] =
val backend = HttpURLConnectionBackend()
val response = basicRequest.get(uri"$url").send(backend)
Thread.sleep(400)
response.body match
case Left(error) =>
println(s"Failed to fetch URL $url: $error")
None
case Right(html) =>
val doc: Document = Jsoup.parse(html)
val titleOpt = Option(doc.select("title").text().split('|').headOption.getOrElse("").trim)
val videoSrcOpt = Option(doc.select("video > source").attr("src"))
val descriptionOpt = Option(doc.select("div.video-course-notes").text().replaceAll("\n", "").trim)
for
title <- titleOpt
videoSrc <- videoSrcOpt
description <- descriptionOpt
yield
val contentUrl = s"https://alvinalexander.com$videoSrc"
val thumbnailUrl = contentUrl.replace(".mp4", ".jpg")
val uri = url.stripPrefix("https://alvinalexander.com")
val newThumbnailUrl =
if uri.startsWith("/video-course/advanced-scala-3") then
"https://alvinalexander.com/videos/courses/advanced-scala-3/advanced-scala-3-video-course.jpg"
else if uri.startsWith("/video-course/intro-scala-3") then
"https://alvinalexander.com/videos/courses/intro-scala-3/introduction-to-scala-3-video-course.jpg"
else if uri.startsWith("/video-course/intro-fp") then
"https://alvinalexander.com/videos/courses/intro-fp/introduction-functional-programming-video-course.png"
else if uri.startsWith("/video-course/functional-programming-2") then
"https://alvinalexander.com/videos/courses/functional-programming-in-depth/advanced-functional-programming-video-course.jpg"
else
// "you should never come here" case
"https://alvinalexander.com/videos/courses/advanced-scala-3/advanced-scala-3-video-course.jpg"
VideoDetails(
uri,
name = title,
description = description.take(200),
thumbnailUrl = newThumbnailUrl,
contentUrl = contentUrl,
uploadDate = uploadDate
)
def main(args: Array[String]): Unit =
val urls = URLs.urls
urls.foreach { url =>
fetchVideoDetails(url) match
case Some(details) =>
println(s"""
|{
| uri = "${details.uri}"
| name = "${details.name}"
| description = "${details.description}"
| thumbnailUrl = "${details.thumbnailUrl}"
| contentUrl = "${details.contentUrl}"
| uploadDate = "${details.uploadDate}"
|},
""".stripMargin.trim)
case None => println(s"No video details found for URL: $url")
}
object URLs:
val urls = List(
"https://alvinalexander.com/video-course/advanced-scala-3/opaque-types/",
"https://alvinalexander.com/video-course/functional-programming-2/a-small-zio-2-example-application/",
"https://alvinalexander.com/video-course/advanced-scala-3/higher-order-functions-2/",
// more here ...
)
As mentioned, some of that code DOES NOT represent best practices, and in fact, probably represents worst practices. But I was in a rush and was going to throw this code away when I was done, but thought I’d share it here in case it helps anyone else.