My Scala Sed project is still a work in progress, but I made some progress on a new version this week. My initial need this week was to have Sed return a
String rather than printing directly to STDOUT. This change gave me more ability to post-process a file. After that I realized it would really be useful if the custom function I pass to Sed had two more pieces of information available to it:
- The line number of the string Sed passed to it
Mapof key/value pairs the helper function could use while processing the file
Note: In this article “Sed” refers to my project, and “sed” refers to the Unix command-line utility.Back to top
In a “basic use” scenario, this is how I use the new version of Sed in a Scala shell script to change the “layout:” lines in 55 Markdown files whose names are in the files-to-process.txt file:
Scala FAQ: How can I use regular expression (regex) pattern matching in a
match expression (a Scala match/case expression)?
As I wrote in my Scala sed class post earlier today, Jon Pretty’s Kaleidoscope project lets you use string pattern-matching code in Scala
match expressions. This enables regex pattern-matching code like this:
A few times during the past year I got tired of trying to remember the Unix/Linux
sed syntax while wanting to make edits to many files, so this weekend I wrote a little
sed-like Scala class.
Kaleidoscope is a Scala pattern-matching library created in a string interpolator style.
This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 1.9, “Extracting Parts of a String that Match Patterns.”
You want to extract one or more parts of a Scala
String that match the regular-expression patterns you specify.
Define the regular-expression (regex) patterns you want to extract, placing parentheses around them so you can extract them as “regular-expression groups.” First, define the desired pattern:
Java FAQ: How can I use multiple regular expression patterns with the
replaceAll method in the Java
Here’s a little example that shows how to replace many regular expression (regex) patterns with one replacement string in Scala and Java. I’ll show all of this code in Scala’s interactive interpreter environment, but in this case Scala is very similar to Java, so the initial solution can easily be converted to Java.
Note: The code shown below is a bit old. If you want to perform a “search and replace” operation on all instances of a given pattern, all you have to do these days is use the
replaceAll method on a Java
String, like this:
String s = "123 Main Street"; String result = s.replaceAll("[0-9]", "-");
That second line of code returns the string “
--- Main Street”. I kept the information below here for background information.
In this post I share the contents of a custom TextMate command I just created that uses
sed to convert markdown content in the TextMate editor to a “pretty printer” version of HTML:
#!/bin/sh PATH=$PATH:/usr/local/bin # note: 'sed -E' gives you the advanced regex's # use pandoc to convert from markdown to html, # then use sed to clean up the resulting html pandoc -f markdown -t html |\ sed -Ee "/<p|<h2|<h3|<h4|<aside|<div|<ul|<ol/i\\ \\"
You can try to use a command like
tidy to clean the HTML, but the version of
tidy I have does not know about HTML5 tags. The TextMate Markdown plugin also doesn’t work the way I want it. Besides that, I’m trying to learn more about writing TextMate commands anyway.
As an important note, when you set this up as a TextMate command and then run it, it will convert the TextMate editor contents from markdown to HTML.
(In a related note, serenity.de is also a good resource for TextMate command and bundle documentation.)
In summary, this code shows:
* How to execute a Unix shell command from TextMate
* Specifically, how to execute a
sed command from TextMate
* How to use modern regular expressions with
* How to search for multiple regex search patterns with
PHP FAQ: How do I remove all non-printable characters from a string in PHP?
I don’t know of any built-in PHP functions to remove all non-printable characters from a string, so the solution is to use the
preg_replace function with an appropriate regular expression.