For a variety of reasons you can end up with text files on your Unix filesystem that have binary characters in them. In fact, I showed you how to do this to yourself in my blog post about the Unix script command. (There's nothing wrong with this approach; it's just a by-product of using the script command.)
To fix this problem, and get the binary characters out of your files, there are several approaches you can take to fix this problem. Probably the easiest solution involves using the Unix tr command. Here's all you have to remove non-printable binary characters (garbage) from a Unix text file:
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
This command uses the -c and -d arguments to the tr command to remove all the characters from the input stream other than the ASCII octal values that are shown between the single quotes. This command specifically allows the following characters to pass through this Unix filter:
octal 11: tab octal 12: linefeed octal 15: carriage return octal 40 through octal 176: all the "good" keyboard characters
All the other binary characters -- the "garbage" characters in your file -- are stripped out during this translation process.
For more information on ASCII characters check out the ASCII character tables at either of these sites:
Error! Missing the slash before 40
there's a slash missing before the 40. I had to figure out why when it didn't work right. the command is:
tr -cd '\11\12\15\40-\176' < file-with-binary-chars > clean-file
Sorry about that, and thanks
Sorry about that, and thanks for letting me know. I've updated the command above.
Typo in explanation
Hello
This oneliner saved my life!
Found a typo:
octal 140 through octal 176: all the "good" keyboard characters
should be
octal 40 through octal 176: all the "good" keyboard characters
Thanks again
Thanks for catching this
Thanks for catching this typo. I just made the correction to the article.
Post new comment