A Ruby script to remove binary (garbage) characters from a text file

Problem: You have a file that should be a plain text file, but for some reason it has a bunch of non-printable binary characters (also known as garbage characters) in it, and you'd like a Ruby script that can create a clean version of the file.

Solution: I've demonstrated how to do this in another blog post by using the Unix tr command, but in case you'd like a Ruby script to clean up a file like this, I thought I'd write up a quick program and share it here.

To that end, here's the source code for a Ruby script that reads a given input file, goes through each character in the file, and only outputs valid, printable, ASCII characters to standard output:

#!/usr/bin/ruby

#-------------------------------------------------------------------
#
# Program: PrintableCharsOnly.rb
#
# Purpose: A Ruby script that takes a file as input, and strips out 
#          all the undesirable characters from that file, and prints 
#          out only "good" ASCII characters, i.e., more or less all 
#          the keyboard characters, including TAB, newline, and 
#          carriage return.
#
# Author:  alvin alexander, devdaily.com
#
#-------------------------------------------------------------------

# bail out unless we get the right number of command line arguments
unless ARGV.length == 1
  puts "Dude, not the right number of arguments."
  puts "Usage: ruby PrintableCharsOnly.rb YourInputFile > YourOutputFile\n"
  exit
end

# get the input filename from the command line
file = ARGV[0]

# open the file
File.readlines(file).each do |line|
  line.each_byte { |c|
    # only print the ascii characters we want to allow
    print c.chr if c==9 || c==10 || c==13 || (c > 31 && c < 127)
  }
end

Discussion

As you can see from the source code shown above, this Ruby script only prints the following characters (or byte code values) on standard output:

byte/decimal value 9: tab character
byte/decimal value 10: linefeed
byte/decimal value 13: carriage return
byte/decimal value 140 through octal 176: all the "good" keyboard characters 

For more information on ASCII characters

For more information on ASCII characters check out the ASCII character tables at either of these sites:

Share it!

There’s just one person behind this website; if this article was helpful (or interesting), I’d appreciate it if you’d share it. Thanks, Al.

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.