How to replace newline character with sed on Mac OS X (macOS)

I don’t have much time to explain this today, but ... if you want to see how to use the sed command on a Mac OS X (macOS) system to search for newline characters in the input pattern and replace them with something else in the replacement pattern, this example might point you in the right direction.

The problem

My problem was that I have a bunch of files with dozens to hundreds of paragraphs that look like this:

Lorem ipsum dolor sit amet, 
consectetur adipiscing elit, 

sed do eiusmod tempor incididunt 
ut labore et dolore magna aliqua.

(Those are very short sentences and paragraphs for this example.)

What I want are continuous paragraphs with no unnecessary line breaks, so I want to use sed to create output like this:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, 

sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

The solution

To solve the problem I first put this sed command in a file named sed.cmds:

s/([a-zA-Z,`])\n([a-zA-Z`])/\1 \2/g

When I then tried to run the command like this:

sed -E -f sed.cmds Input.txt > Output.txt

the command wouldn’t work properly. After a lot of searching I finally found this Stack Overflow thread, and in short, the solution is to run this sed command instead:

sed -e ':a' -e 'N' -e '$!ba' -E -f sed.cmds Input.txt > Output.txt

When I run that sed command with my sed.cmds file, it successfully finds the newline characters in the sed input stream with the \n pattern, and then I replace the newline character with a blank space in the replacement pattern.

Using the search pattern in the replacement pattern

One other note: The \1 and \2 in the replacement pattern let me use the two patterns in the search pattern that I “capture.” Here’s a quick look at how they relate:

\1    ([a-zA-Z,`])
\2    ([a-zA-Z,`])

The regex inside the () parentheses is a capture group, and then \1 and \2 are variables that you can use in the replacement pattern.

2023 Update: Working with LaTeX

As a brief update, I can confirm that this command worked today as I am currently working with LaTeX documents and the pandoc command:

# [1] sed.cmds file
# use this to convert LaTeX sentences
s/([a-zA-Z0-9,“’‘} ])\n([a-zA-Z0-9“‘{ ])/\1 \2/g

# [2] the pandoc+sed command i use
$ pandoc HOFs.tex --to=plain | sed -e ':a' -e 'N' -e '$!ba' -E -f sed.cmds

In this example, the pandoc command converts an input LaTeX document into plain text, and then sed converts multi-line paragraphs like this:

four score
and seven
years ago

into a single paragraph like this:

four score and seven years ago

which is what I need today.

That’s all (for now)

I haven’t looked into all of those sed command line options to see which ones are truly needed and which ones aren’t, but again, at the moment I can confirm that this works with the Unix system on macOS 10.12.1 (Sierra), properly finding the newline characters in sed’s input stream.