PHP: How to remove non-printable characters from strings

PHP FAQ: How do I remove all non-printable characters from a string in PHP?

I don’t know of any built-in PHP functions to remove all non-printable characters from a string, so the solution is to use the preg_replace function with an appropriate regular expression.

Solution: Allow only ASCII characters

For my purposes I don’t have to work with Unicode characters, so one of the best solutions for my purposes is to strip all non-ASCII characters from the input string. That can be done with this preg_replace code:

$result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);

That code removes any characters in the hex ranges 0-31 and 128-255, leaving only the hex characters 32-127 in the resulting string, which I call $result in this example.

You can see how this works in the interactive PHP shell. In this example I just want to get rid of the characters ‘ and ’, which don’t work well in my current application:

myprompt> php -a
Interactive shell

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
php > echo $result;
Hello, she said.

As you can see, the characters ‘ and ’ are not in the $result string.

Note: You can read more about hex and octal character sequences on this page.

Also note that if you prefer octal characters to hexadecimal characters, this code should work as well:

$result = preg_replace('/[\000-\031\200-\377]/', '', $string);

I just tested that on my example and it worked fine, but I haven’t tested it with other strings. (This page is a good resource for basic octal and hex values.)

Possible solution: Use the 'print' regex

Another possible solution is to use the ‘print’ regular expression shown in this example with preg_replace:

$result = preg_replace('/[[:^print:]]/', "", $string);

Per the PHP regex doc, the [:print:] regex stands for “any printable character,” so for my example I thought it would leave the ‘ and ’ characters in the resulting string, but to my surprise the output looks like this:

php > $string = "‘Hello,’ she said.";
php > $result = preg_replace('/[[:^print:]]/', "", $string);
php > echo $result;
?Hello,? she said.

I don’t know why that regex ends up putting ? characters in the resulting string, so at the moment I’m calling this a “possible solution” rather than a solution. Note that if you just echo out the original string, it prints fine:

php > echo $string;
‘Hello,’ she said.

More solutions (Unicode)

As I mentioned, I don’t currently have to concern myself with Unicode characters, so the original ASCII character solution I showed works fine for me. If you do need to handle Unicode characters, this SO page shows a possible solution.

More PHP regular expressions

Finally, while I’m in the neighborhood, here’s a list of PHP “range” regular expressions from the regex page. As the “range” name implies, these patterns can be used to match ranges of characters in PHP strings:

[:digit:]      Only the digits 0 to 9
[:alnum:]      Any alphanumeric character 0 to 9 OR A to Z or a to z.
[:alpha:]      Any alpha character A to Z or a to z.
[:blank:]      Space and TAB characters only.
[:xdigit:]     .
[:punct:]      Punctuation symbols . , " ' ? ! ; :
[:print:]      Any printable character.
[:space:]      Any space characters.
[:graph:]      .
[:upper:]      Any alpha character A to Z.
[:lower:]      Any alpha character a to z.
[:cntrl:]      .

As shown in my earlier example, you actually need to use two brackets with these regex patterns when using preg_replace:

$result = preg_replace('/[[:^print:]]/', "", $string);


In summary, if you wanted to see how to remove non-printable characters from strings in PHP, I hope these examples are helpful.