Table of Contents
My Scala Sed project is still a work in progress, but I made some progress on a new version this week. My initial need this week was to have Sed return a String
rather than printing directly to STDOUT. This change gave me more ability to post-process a file. After that I realized it would really be useful if the custom function I pass to Sed had two more pieces of information available to it:
- The line number of the string Sed passed to it
- A
Map
of key/value pairs the helper function could use while processing the file
Note: In this article “Sed” refers to my project, and “sed” refers to the Unix command-line utility.
Basic use
In a “basic use” scenario, this is how I use the new version of Sed in a Scala shell script to change the “layout:” lines in 55 Markdown files whose names are in the files-to-process.txt file:
#!/bin/sh
exec scala (more here) ...
!#
import (more here) ...
val filenames = readFileAsList("files-to-process.txt")
for (filename <- filenames) {
println(s"processing $filename ...")
val source = Source.fromFile(filename)
val sedResult: String = SedFactory.getSed(source, updateLayout _).run
writeFile(filename, sedResult)
}
def updateLayout(currentLine: String): SedAction = {
if (currentLine.startsWith("layout:")) {
UpdateLine("layout: book")
} else {
UpdateLine(currentLine)
}
}
As you can see in the updateLayout
function, all I do is a simple search-and-replace on the layout:
line. The important things are (a) my custom function named updateLayout
, and (b) this line of code where I pass that function to create an appropriate Sed interpreter:
val sedResult: String = SedFactory.getSed(source, updateLayout _).run
Using a Map
In a more complicated scenario, my custom function may need to update every line with some data kept in a key/value Map. In that scenario my function might look like this:
def updateHeader(
currentLine: String,
currentLineNum: Int,
kvMap: Map[String, String]
): SedAction = {
if (currentLine.startsWith("num:")) {
// add the `next-page` and `prev-page` fields after the `num` field
val nextPage = kvMap("next-page")
val prevPage = kvMap("prev-page")
val rez = s"${currentLine}\nprevious-page: ${prevPage}\nnext-page: ${nextPage}"
UpdateLine(rez)
} else {
UpdateLine(currentLine)
}
}
In this situation I need to create a unique map for each of the 55 files I’m processing, so some pseudocode for my program’s main loop looks like this:
for (filename <- filenames) {
val source = Source.fromFile(filename)
// do some work to derive the map variables, then this:
val kvMap = Map(
"num" -> s"$counter",
"next-page" -> nextPage,
"prev-page" -> prevPage
)
val sedResult = new Sed(source, updateHeader _, kvMap).run
writeFile(filename, sedResult)
}
Match expressions
Despite showing if/else expressions in those examples, what I usually do is write match
expressions inside my custom Sed functions. Here’s an example that demonstrates a typical match
expression:
def rmNextPrevPageLines(currentLine: String): SedAction = currentLine match {
case r"^next-page:.*" => DeleteLine
case r"^previous-page:.*" => DeleteLine
case _ => UpdateLine(currentLine)
}
This is much more sed-like. Please note that the code in this example is made possible by Jon Pretty’s Kaleidoscope library, which allows the use of regular expressions in match
expressions. I write a little more about Kaleidoscope in How to use regex pattern matching in a Scala match expression, so see that article and the Kaleidoscope page for more details.
Sed limitations
Please note that because this version of Sed returns a String, one limitation of this approach is memory-related, i.e., you probably won’t want to process very large files with it.
My Sed project
If you’re interested in more details, here’s a link to my Scala Sed project:
That project has a couple of README files that explain some things. The new code I just demonstrated is in the Sed subproject, specifically under the com.alvinalexander.sed_tostring package. See the Sed
class in that package and its associated tests for more details.
As a final warning, because things are very much a work-in-progress the code may change dramatically in the future, but if you’re interested in doing Sed-like processing on many files using Scala rather than sed
, I hope this is a helpful start.
Bonus: Factories and HOFs
If you’re interested in some gory details, the reason I created a SedFactory
is because there are three different Sed classes to give you flexibility in writing your custom Sed functions, some of which you saw above, where the custom functions had different signatures based on each function’s needs.
Therefore, SedFactory
has three overloaded getSed
methods:
object SedFactory {
// currentLine, currentLineNum, map
def getSed(
source: Source,
f:(String, Int, Map[String, String]) => SedAction,
keyValueMap: Map[String, String] = Map("" -> "")
): SedTrait = {
new Sed3Params(source, f, keyValueMap)
}
// currentLine, currentLineNum
def getSed(
source: Source,
f:(String, Int) => SedAction
): SedTrait = {
new SedCurrentLineAndNum(source, f)
}
// currentLine
def getSed(
source: Source,
f:(String) => SedAction
): SedTrait = {
new SedCurrentLine(source, f)
}
// more code ...
This approach lets you write custom functions to work with Sed that match these function signatures:
f:(String, Int, Map[String, String]) => SedAction //Function3
f:(String, Int) => SedAction //Function2
f:(String) => SedAction //Function1
If you’re used to writing functions that take functions as parameters — i.e., higher-order functions, or HOFs — this approach will look familiar. And if you’re not, I’ll take this moment to plug my “Functional Programming, Simplified” book.
As I note in the README-DEV.md file under the Sed subproject, this approach — along with JVM type erasure — has the potential to cause problems in the future, but for Version 0.3, this works okay.