How can I replace chars like ???????? from a string? Sometime the YouTube video title contains characters like this. I don’t want to replace characters like !@#$%^&*().
I am currently using preg_replace('/[^A-Za-z0-9-]/', '', $VideoTitle);
Samples Array:
$VideoTitles[]='Sia 2017 Cheap Thrills 2017 live ????????'; $VideoTitles[]='TAYLOR SWIFT - SHAKE IT OFF ???????? #1989';
Expected Output:
Sia 2017 Cheap Thrills 2017 live TAYLOR SWIFT - SHAKE IT OFF #1989
Advertisement
Answer
Code with sample input: Demo
$VideoTitles=[ 'Kilian à Dijon #4 • Vlog #2 • Primark again !? ???? - YouTube', 'Funfesty ???? ???? on Twitter: "Je commence à avoir mal à la tête à force', 'Sia 2017 Cheap Thrills 2017 live ????????' ]; $VideoTitles=preg_replace('/[^ -x{2122}]s+|s*[^ -x{2122}]/u','',$VideoTitles); // remove out of range characters and whitespace character on one side only var_export($VideoTitles);
Output:
array ( 0 => 'Kilian à Dijon #4 • Vlog #2 • Primark again !? - YouTube', 1 => 'Funfesty on Twitter: "Je commence à avoir mal à la tête à force', 2 => 'Sia 2017 Cheap Thrills 2017 live', )
The above regex pattern uses a character range from x20-x2122
(space to trade-mark-sign). I have selected this range because it should cover the vast majority of word-related characters including letters with accents and non-English characters. (Admittedly, it also includes many non-word-related characters. You may like to use two separate ranges for greater specificity like: /[^x{20}-x{60}x{7B}-x{FF}]/ui
— this case-insensitively searches two ranges: space to grave accent and left curly bracket to latin small letter y with diaeresis)
If you find that this range is unnecessarily generous or takes too long to process, you can make your own decision about the appropriate character range.
For instance, you might like the much lighter but less generous /[^x20-x7E]/u
(from space to tilde). However, if you apply it to either of my above French $VideoTitles
then you will mangle the text by removing legitimate letters.
Here is a menu of characters and their unicode numbers to help you understand what is inside the aforementioned ranges and beyond.
*And remember to include a unicode flag u
after your closing delimiter.
For completeness, I should say the literal/narrow solution for removing the two emojis would be:
$VideoTitle=preg_replace('/[x{1F3A7}x{1F3AC}]/u','',$VideoTitle); // omit 2 emojis
These emojis are called “clapper board (U+1F3AC)” and “headphone (U+1F3A7)”.