Mac OS X: ‘sed’ commands I use to clean MacDown HTML output

FWIW, this is the source code for a sed script I use on my Mac OS X system to convert HTML output generated by MacDown into a format I need. MacDown generates some extra “cruft” that I don’t need, so I use these sed commands to clean up that HTML output:

# clean h1 tags. also add newlines before each h1.
s?<h1 id="toc_.*">\(.*\)</h1>?\
\
\
<h1>\1</h1>?

# clean h2 tags
s?<h2 id="toc_.*">\(.*\)</h2>?<h2>\1</h2>?

# clean h3 tags
s?<h3 id="toc_.*">\(.*\)</h3>?<h3>\1</h3>?

# clean pre tags
s?<pre><code>?<pre>?

s?</code></pre>?</pre>?

# these next two lines are unique to modifying "scala>" content
# inside pre tags (something i do while converting the scala cookbook)
s?^scala&gt; \(.*\)$?scala\&gt; <strong>\1</strong>?

s?^<pre>scala&gt; \(.*\)$?<pre>scala\&gt; <strong>\1</strong>?

I won’t explain those sed commands, I just wanted to put it out here in case I needed it again. Okay, one thing to say is that the “header” patterns use the sed “search and replace” functionality.

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
2 + 10 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.