My Scala Sed project is still a work in progress, but I made some progress on a new version this week. My initial need this week was to have Sed return a
String rather than printing directly to STDOUT. This change gave me more ability to post-process a file. After that I realized it would really be useful if the custom function I pass to Sed had two more pieces of information available to it:
- The line number of the string Sed passed to it
Mapof key/value pairs the helper function could use while processing the file
Note: In this article “Sed” refers to my project, and “sed” refers to the Unix command-line utility.Back to top
In a “basic use” scenario, this is how I use the new version of Sed in a Scala shell script to change the “layout:” lines in 55 Markdown files whose names are in the files-to-process.txt file:
Scala FAQ: How can I use regular expression (regex) pattern matching in a
match expression (a Scala match/case expression)?
As I wrote in my Scala sed class post earlier today, Jon Pretty’s Kaleidoscope project lets you use string pattern-matching code in Scala
match expressions. This enables regex pattern-matching code like this:
A few times during the past year I got tired of trying to remember the Unix/Linux
sed syntax while wanting to make edits to many files, so this weekend I wrote a little
sed-like Scala class.
I don’t remember exactly why I wrote this Scala shell script, but if I remember right I was having a problem getting
sed to work properly, so I wrote this little script to insert an Amazon Kindle “break” tag before each
<h1> tag in an HTML file:
As a brief note about the Linux/Unix
sed command, today I learned how to append multiple lines of text to an HTML (or XML) file on macOS. The short answer is that I created a
sed commands file named changes.sed with these contents:
I don’t have much time to explain this today, but ... if you want to see how to use the
sed command on a Mac OS X (macOS) system to search for newline characters in the input pattern and replace them with something else in the replacement pattern, this example might point you in the right direction.
In this post I share the contents of a custom TextMate command I just created that uses
sed to convert markdown content in the TextMate editor to a “pretty printer” version of HTML:
#!/bin/sh PATH=$PATH:/usr/local/bin # note: 'sed -E' gives you the advanced regex's # use pandoc to convert from markdown to html, # then use sed to clean up the resulting html pandoc -f markdown -t html |\ sed -Ee "/<p|<h2|<h3|<h4|<aside|<div|<ul|<ol/i\\ \\"
You can try to use a command like
tidy to clean the HTML, but the version of
tidy I have does not know about HTML5 tags. The TextMate Markdown plugin also doesn’t work the way I want it. Besides that, I’m trying to learn more about writing TextMate commands anyway.
As an important note, when you set this up as a TextMate command and then run it, it will convert the TextMate editor contents from markdown to HTML.
(In a related note, serenity.de is also a good resource for TextMate command and bundle documentation.)
In summary, this code shows:
* How to execute a Unix shell command from TextMate
* Specifically, how to execute a
sed command from TextMate
* How to use modern regular expressions with
* How to search for multiple regex search patterns with
In a previous blog post I demonstrated how to use sed to insert text before or after a line in many files, and in this example I'd like to demonstrate how to delete a range of lines using sed.
sed delete - How to delete a range of lines using sed
The problem I had today was that I just re-generated 99 HTML files for my Introduction to Unix/Linux tutorial using Latex2HTML, and it generates a bunch of "junk" in my HTML files that looks like this:
As a quick
sed solution, if you get this “\1 not defined in the RE” error message when running a
$ sed -f sed.cmds c4.in.html > c4.out.html sed: 2: sed.cmds: \1 not defined in the RE
the problem probably isn’t too bad. For me I usually get the error message when I forget to “escape” parentheses that I use in my search pattern. I usually write this, which is an error:
when I need to write that
sed command like this:
FWIW, this is the source code for a
sed script I use on my Mac OS X system to convert HTML output generated by MacDown into a format I need. MacDown generates some extra “cruft” that I don’t need, so I use these
sed commands to clean up that HTML output: