regular expression

My Scala Sed project: More features, returning strings

Table of Contents1 - Basic use2 - Using a Map3 - Match expressions4 - Sed limitations5 - My Sed project6 - Bonus: Factories and HOFs

My Scala Sed project is still a work in progress, but I made some progress on a new version this week. My initial need this week was to have Sed return a String rather than printing directly to STDOUT. This change gave me more ability to post-process a file. After that I realized it would really be useful if the custom function I pass to Sed had two more pieces of information available to it:

  • The line number of the string Sed passed to it
  • A Map of key/value pairs the helper function could use while processing the file

Note: In this article “Sed” refers to my project, and “sed” refers to the Unix command-line utility.

Back to top

Basic use

In a “basic use” scenario, this is how I use the new version of Sed in a Scala shell script to change the “layout:” lines in 55 Markdown files whose names are in the files-to-process.txt file:

A little Scala `sed` class

A few times during the past year I got tired of trying to remember the Unix/Linux sed syntax while wanting to make edits to many files, so this weekend I wrote a little sed-like Scala class.

Scala: How to extract parts of a String that match regex patterns

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 1.9, “Extracting Parts of a String that Match Patterns.”

Problem

You want to extract one or more parts of a Scala String that match the regular-expression patterns you specify.

Solution

Define the regular-expression (regex) patterns you want to extract, placing parentheses around them so you can extract them as “regular-expression groups.” First, define the desired pattern:

How to use multiple regex patterns with replaceAll (Java String class)

Table of Contents1 - 1) A simple string2 - 2) Replace multiple patterns in that string3 - 3) More explanation4 - Multiple search patterns5 - Summary

Java FAQ: How can I use multiple regular expression patterns with the replaceAll method in the Java String class?

Here’s a little example that shows how to replace many regular expression (regex) patterns with one replacement string in Scala and Java. I’ll show all of this code in Scala’s interactive interpreter environment, but in this case Scala is very similar to Java, so the initial solution can easily be converted to Java.

A Java method to replace all instances of a pattern in a String with a replacement pattern

Note: The code shown below is a bit old. If you want to perform a “search and replace” operation on all instances of a given pattern, all you have to do these days is use the replaceAll method on a Java String, like this:

String s = "123 Main Street";
String result = s.replaceAll("[0-9]", "-");

That second line of code returns the string “--- Main Street”. I kept the information below here for background information.

A custom TextMate command that uses ‘sed’

In this post I share the contents of a custom TextMate command I just created that uses pandoc and sed to convert markdown content in the TextMate editor to a “pretty printer” version of HTML:

#!/bin/sh

PATH=$PATH:/usr/local/bin

# note: 'sed -E' gives you the advanced regex's

# use pandoc to convert from markdown to html,
# then use sed to clean up the resulting html
pandoc -f markdown -t html |\
sed -Ee "/<p|<h2|<h3|<h4|<aside|<div|<ul|<ol/i\\
\\"

You can try to use a command like tidy to clean the HTML, but the version of tidy I have does not know about HTML5 tags. The TextMate Markdown plugin also doesn’t work the way I want it. Besides that, I’m trying to learn more about writing TextMate commands anyway.

As an important note, when you set this up as a TextMate command and then run it, it will convert the TextMate editor contents from markdown to HTML.

(In a related note, serenity.de is also a good resource for TextMate command and bundle documentation.)

In summary, this code shows:

* How to execute a Unix shell command from TextMate
* Specifically, how to execute a sed command from TextMate
* How to use modern regular expressions with sed (the -E option)
* How to search for multiple regex search patterns with sed

PHP: How to remove non-printable characters from strings

PHP FAQ: How do I remove all non-printable characters from a string in PHP?

I don’t know of any built-in PHP functions to remove all non-printable characters from a string, so the solution is to use the preg_replace function with an appropriate regular expression.