sed

My Scala Sed project: More features, returning strings

Table of Contents1 - Basic use2 - Using a Map3 - Match expressions4 - Sed limitations5 - My Sed project6 - Bonus: Factories and HOFs

My Scala Sed project is still a work in progress, but I made some progress on a new version this week. My initial need this week was to have Sed return a String rather than printing directly to STDOUT. This change gave me more ability to post-process a file. After that I realized it would really be useful if the custom function I pass to Sed had two more pieces of information available to it:

  • The line number of the string Sed passed to it
  • A Map of key/value pairs the helper function could use while processing the file

Note: In this article “Sed” refers to my project, and “sed” refers to the Unix command-line utility.

Back to top

Basic use

In a “basic use” scenario, this is how I use the new version of Sed in a Scala shell script to change the “layout:” lines in 55 Markdown files whose names are in the files-to-process.txt file:

A little Scala `sed` class

A few times during the past year I got tired of trying to remember the Unix/Linux sed syntax while wanting to make edits to many files, so this weekend I wrote a little sed-like Scala class.

A Scala shell script to insert text before a matching pattern

I don’t remember exactly why I wrote this Scala shell script, but if I remember right I was having a problem getting sed to work properly, so I wrote this little script to insert an Amazon Kindle “break” tag before each <h1> tag in an HTML file:

How to replace newline character with sed on Mac OS X (macOS)

I don’t have much time to explain this today, but ... if you want to see how to use the sed command on a Mac OS X (macOS) system to search for newline characters in the input pattern and replace them with something else in the replacement pattern, this example might point you in the right direction.

A custom TextMate command that uses ‘sed’

In this post I share the contents of a custom TextMate command I just created that uses pandoc and sed to convert markdown content in the TextMate editor to a “pretty printer” version of HTML:

#!/bin/sh

PATH=$PATH:/usr/local/bin

# note: 'sed -E' gives you the advanced regex's

# use pandoc to convert from markdown to html,
# then use sed to clean up the resulting html
pandoc -f markdown -t html |\
sed -Ee "/<p|<h2|<h3|<h4|<aside|<div|<ul|<ol/i\\
\\"

You can try to use a command like tidy to clean the HTML, but the version of tidy I have does not know about HTML5 tags. The TextMate Markdown plugin also doesn’t work the way I want it. Besides that, I’m trying to learn more about writing TextMate commands anyway.

As an important note, when you set this up as a TextMate command and then run it, it will convert the TextMate editor contents from markdown to HTML.

(In a related note, serenity.de is also a good resource for TextMate command and bundle documentation.)

In summary, this code shows:

* How to execute a Unix shell command from TextMate
* Specifically, how to execute a sed command from TextMate
* How to use modern regular expressions with sed (the -E option)
* How to search for multiple regex search patterns with sed

How to use the Linux sed command to delete a range of lines

In a previous blog post I demonstrated how to use sed to insert text before or after a line in many files, and in this example I'd like to demonstrate how to delete a range of lines using sed.

sed delete - How to delete a range of lines using sed

The problem I had today was that I just re-generated 99 HTML files for my Introduction to Unix/Linux tutorial using Latex2HTML, and it generates a bunch of "junk" in my HTML files that looks like this:

Solution to sed error message: “\1 not defined in the RE”

As a quick sed solution, if you get this “\1 not defined in the RE” error message when running a sed script:

$ sed -f sed.cmds c4.in.html > c4.out.html
sed: 2: sed.cmds: \1 not defined in the RE

the problem probably isn’t too bad. For me I usually get the error message when I forget to “escape” parentheses that I use in my search pattern. I usually write this, which is an error:

s/foo(.*)bar/\1/

when I need to write that sed command like this:

Mac OS X: Unix sed commands I use to clean MacDown HTML output

FWIW, this is the source code for a sed script I use on my Mac OS X system to convert HTML output generated by MacDown into a format I need. MacDown generates some extra “cruft” that I don’t need, so I use these sed commands to clean up that HTML output: