Problem: You have a file that should be a plain text file, but for some reason it has a bunch of non-printable binary characters (also known as garbage characters) in it, and you'd like a Ruby script that can create a clean version of the file.
Solution: I've demonstrated how to do this in another blog post by using the Unix tr command, but in case you'd like a Ruby script to clean up a file like this, I thought I'd write up a quick program and share it here.
To that end, here's the source code for a Ruby script that reads a given input file, goes through each character in the file, and only outputs valid, printable, ASCII characters to standard output:
#!/usr/bin/ruby #------------------------------------------------------------------- # # Program: PrintableCharsOnly.rb # # Purpose: A Ruby script that takes a file as input, and strips out # all the undesirable characters from that file, and prints # out only "good" ASCII characters, i.e., more or less all # the keyboard characters, including TAB, newline, and # carriage return. # # Author: alvin alexander, devdaily.com # #------------------------------------------------------------------- # bail out unless we get the right number of command line arguments unless ARGV.length == 1 puts "Dude, not the right number of arguments." puts "Usage: ruby PrintableCharsOnly.rb YourInputFile > YourOutputFile\n" exit end # get the input filename from the command line file = ARGV[0] # open the file File.readlines(file).each do |line| line.each_byte { |c| # only print the ascii characters we want to allow print c.chr if c==9 || c==10 || c==13 || (c > 31 && c < 127) } end
Discussion
As you can see from the source code shown above, this Ruby script only prints the following characters (or byte code values) on standard output:
byte/decimal value 9: tab character byte/decimal value 10: linefeed byte/decimal value 13: carriage return byte/decimal value 140 through octal 176: all the "good" keyboard characters
For more information on ASCII characters
For more information on ASCII characters check out the ASCII character tables at either of these sites: