Skip to content
Advertisement

Best practice to avoid mb_strimwidth() stripping in the middle of a multi character html special character

I have a string that was being truncated for titles in a blog and this worked fine on the whole via the PHP built in function mb_strimwidth().

Then suddenly the user made a post on the website in question that had a title which included an apostrophe.

The apostrophe is being output as ’ and the problem is that the character length of the title string in this particular example came just into the middle of this 7 character apostrophe special chr.

So the title ended up appearing on the site as something like…

This is a test title to show you how the apostrophe was being cut in half &#82

Is there a strategy to still use mb_strimwidth() whilst avoiding this kind of situation ?

Advertisement

Answer

Your data doesn’t contain an actual , it contains the entity string representation ’. If you want mb_strimwidth() to care about it you need to convert it back from the entity representation. Or, ideally, take steps to ensure that you don’t have unexpected entity representations in your source data in the first place. [See: UTF-8 all the way through]

$input = "This is a test title to show you how the apostrophe was being cut in half ’ oh noes";

var_dump(
    $input,
    mb_strimwidth($input, 0, 78),
    $decoded = html_entity_decode($input),
    mb_strimwidth($decoded, 0, 78)
);

Output:

string(89) "This is a test title to show you how the apostrophe was being cut in half ’ oh noes"
string(78) "This is a test title to show you how the apostrophe was being cut in half &#82"
string(85) "This is a test title to show you how the apostrophe was being cut in half ’ oh noes"
string(80) "This is a test title to show you how the apostrophe was being cut in half ’ oh"
User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement