I want to do a search & replace in PHP with a symbol.
This is the symbol: ➤
I want to replace it with a dash, but that doesn’t work. The problem looks like that the symbol cannot be found, even though it’s there.
Other ‘normal’ search and replace operations work as expected. But replacing this symbol does not.
Any ideas how to address this symbol, so that the search and replace function actually can find it and replace it?
Advertisement
Answer
Your problem is (almost certainly) related to text/character encoding.
Special characters such as the ➤
you are referring to, are not part of the classical ISO-8859-1 character set; they are however part of Unicode family (codepoint U+27A4
to be exact). This means that, in order to use this (multibyte)character, you have to use a unicode character set, which generally means UTF-8
.
All the basic characters (think A-Z, numbers, spaces, …) overlap between UTF-8 and ISO-8859-1 (which is effectively the default character set), so when you don’t use any special characters, you could use the wrong charset and things will pretty much continue to work just fine; that is until you try to use a character that is not part of the basic set.
Since your problem takes place entirely on the server side (inside PHP), and doesn’t really touch upon the HTTP and HTML layers, we won’t have to go into utf-8
content-type headers and the like, but you should be aware of them for future issues (if you weren’t already).
The issue you have should be resolved once you meet 2 criteria:
- Not all PHP functions are multibyte-aware; I’m not 100% sure, but i think
str_replace
is one of those which is not. Thepreg_replace
function with itsu
flag enabled definitely is multibyte aware, and can serve the exact same function. - The text editor or IDE that you used to create the
.php
file may or may not be set to UTF-8 encoding, if it wasn’t then you should switch that in order to be able to use such characters literally inside the source code.
Something like this should function correctly assuming the .php-file is stored in UTF-8 format:
$output = preg_replace('#➤#u', '-', $input);