When working with text files on a Solaris system, you'll occasionally run into a situation where a file will contain extended ASCII characters. These extended characters will generally appear to begin with ^ or [ characters in your text files. For instance, the vi editor will show ^M characters in DOS text files when they are transferred to Solaris systems using the ftp command in binary transfer mode. Oftentimes, you'll want to easily delete these characters from your files.
Having run into this problem throughout the years, we created a simple
little program to remove these extended characters from our text files.
The guts of the program is a one-line tr
command that prints only the characters we tell it to print, and removes
all other characters.
A simple example of the tr command is shown below. This example converts the phrase "hello world" into "jello world", by replacing the letter 'h' in the input stream with the letter 'j' in the output stream:
$ echo "hello world" | tr h j
jello world
As a second example, the tr utility can also be used to delete characters as they are read in from the input stream and written to the output stream. For instance, the following command converts the word fred in the input stream into the word red in the output stream, by deleting the letter 'f' in the translation process:
$ echo "fred" | tr -d f
red
tr -cd '\11\12\40-\176' < $INPUT_FILE > $OUTPUT_FILE
In this command, the variable INPUT_FILE must contain the name of the Solaris file you'll be reading from, and OUTPUT_FILE must contain the name of the output file you'll be writing to. When the -c and -d options of the tr command are used in combination like this, the only characters tr writes to the standard output stream are the characters we've specified on the command line.
Although it may not look very attractive, we're using octal characters
in our tr command to make our programming job easier and more
efficient. Our command tells tr to retain only the octal
characters 11, 12, and 40 through 176
when writing to standard output. Octal character 11 corresponds
to the [TAB] character, and octal 12 corresponds to the
[LINEFEED] character. The octal characters 40 through
176 correspond to the standard visible keyboard characters, beginning
with the [Space] character (octal 40) through the ~ character
(octal 176). These are the only characters retained by tr
-- the rest are filtered out, leaving us with a clean ASCII file.