PHP: How to remove non-printable characters from strings

PHP FAQ: How do I remove all non-printable characters from a string in PHP?

I don’t know of any built-in PHP functions to remove all non-printable characters from a string, so the solution is to use the preg_replace function with an appropriate regular expression.

Solution: Allow only ASCII characters

For my purposes I don’t have to work with Unicode characters, so one of the best solutions for my purposes is to strip all non-ASCII characters from the input string. That can be done with this preg_replace code:

$result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

That code removes any characters in the hex ranges 0-31 and 128-255, leaving only the hex characters 32-127 in the resulting string, which I call $result in this example.

You can see how this works in the interactive PHP shell. In this example I just want to get rid of the characters ‘ and ’, which don’t work well in my current application:

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.

As you can see, the characters ‘ and ’ are not in the $result string.

Note: You can read more about hex and octal character sequences on this php.net page.

Also note that if you prefer octal characters to hexadecimal characters, this code should work as well:

$result = preg_replace('/[\000-\031\200-\377]/', '', $string);

I just tested that on my example and it worked fine, but I haven’t tested it with other strings. (This page is a good resource for basic octal and hex values.)

Possible solution: Use the 'print' regex

Another possible solution is to use the ‘print’ regular expression shown in this example with preg_replace:

$result = preg_replace('/[[:^print:]]/', "", $string);

Per the PHP regex doc, the [:print:] regex stands for “any printable character,” so for my example I thought it would leave the ‘ and ’ characters in the resulting string, but to my surprise the output looks like this:

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[[:^print:]]/', "", $string);
php > echo $result;
?Hello,? she said.

I don’t know why that regex ends up putting ? characters in the resulting string, so at the moment I’m calling this a “possible solution” rather than a solution. Note that if you just echo out the original string, it prints fine:

php > echo $string;
‘Hello,’ she said.

More solutions (Unicode)

As I mentioned, I don’t currently have to concern myself with Unicode characters, so the original ASCII character solution I showed works fine for me. If you do need to handle Unicode characters, this SO page shows a possible solution.

More PHP regular expressions

Finally, while I’m in the neighborhood, here’s a list of PHP “range” regular expressions from the php.net regex page. As the “range” name implies, these patterns can be used to match ranges of characters in PHP strings:

[:digit:]      Only the digits 0 to 9
[:alnum:]      Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]      Any alpha character A to Z or a to z.
[:blank:]      Space and TAB characters only.
[:xdigit:]     .
[:punct:]      Punctuation symbols . , " ' ? ! ; :
[:print:]      Any printable character.
[:space:]      Any space characters.
[:graph:]      .
[:upper:]      Any alpha character A to Z.
[:lower:]      Any alpha character a to z.
[:cntrl:]      .

As shown in my earlier example, you actually need to use two brackets with these regex patterns when using preg_replace:

$result = preg_replace('/[[:^print:]]/', "", $string);

Summary

In summary, if you wanted to see how to remove non-printable characters from strings in PHP, I hope these examples are helpful.