For a variety of reasons you can end up with text files on your Unix filesystem that have binary characters in them. In fact, I showed you how to do this to yourself in my blog post about the Unix script command. (There’s nothing wrong with that approach; it’s just a by-product of using the script command.)
Remove the garbage characters with the Unix 'tr' command
To fix this problem, and get the binary characters out of your files, there are several approaches you can take to fix this problem. Probably the easiest solution involves using the Unix tr command. Here’s all you have to remove non-printable binary characters (garbage) from a Unix text file:
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
This command uses the -c
and -d
arguments to the tr
command to remove all the characters from the input stream other than the ASCII octal values that are shown between the single quotes. This command specifically allows the following characters to pass through this Unix filter:
octal 11: tab octal 12: linefeed octal 15: carriage return octal 40 through octal 176: all the "good" keyboard characters
All the other binary characters — the “garbage” characters in your file — are stripped out during this translation process.
Remove character sequences with this Perl command
I recently had a file that contained content that looks like this:
^[[33mpackage ^[[0m<empty> { ^[[33mimport ^[[0msys.process.*
Because all of those ^[[
control sequences are made from multiple characters, my tr
command approach won’t work. Fortunately I found this Perl command that does work to remove those character sequences:
perl -pe 's/\x1b\[[0-9;]*[mG]//g' INFILE > OUTFILE
I found that Perl command on this superuser.com page.
For more information on ASCII characters
For more information on ASCII characters check out the ASCII character tables at either of these sites: