Skip to content
Advertisement

Insert space after semi-colon, unless it’s part of an HTML entity

I’m trying to insert a space after each semi-colon, unless the semi-colon is part of an HTML entity. The examples here are short, but my strings can be quite long, with several semi-colons (or none).

JavaScript

I found the following regular expression that does the trick for short strings:

JavaScript

However, if the string is somewhat large, the preg_replace above actually crashes my Apache server (The connection to the server was reset while the page was loading.) Add the following to the sample code above:

JavaScript

The code above (with the large string) crashes Apache but works if I run PHP on the command line.

Elsewhere in my program I use preg_replace on much larger strings without problem, so I’m guessing something in the regular expression overwhelms PHP/Apache.

So, is there a way to ‘fix’ the regex so it works on Apache with large strings or is there another, safer, way to do this?

I’m using PHP 5.2.17 with Apache 2.0.64 on Windows XP SP3, if it’s any help. (Unfortunately, upgrading either PHP or Apache is not an option for now.)

Advertisement

Answer

I would suggest this match expression:

JavaScript

…which matches a series of characters (letters, numbers, and underscore) which is not preceded by an ampersand (or an ampersand followed by a hash symbol) but which is followed by a semicolon.

it breaks down to mean:

JavaScript

replace with the string '$0 '

let me know if this doesn’t work for you

Of course, you could also use [a-zA-Z0-9] instead of w to avoid matching a semicolon, but I don’t think that would ever give you any trouble

Also, you might need to escape the hash symbol as well (because that is the regex comment symbol), like so:

JavaScript

EDIT Not sure, but I’m guessing that putting the word boundary at the beginning is going to make it a bit more efficient (and thus less likely to crash your server), so I changed that in the expressions and the break-down…

EDIT 2 … and a bit more info on why your expression might be making your server crash: Catastrophic Backtracking — I think this applies (?) hmmm…. good info nonetheless

FINAL EDIT if you are looking to only add a space after a semicolon if there is not already whitespace after it (i.e. add one in the case of pellentesque;odio but not in the case of pellentesque; odio), then add an additional lookahead at the end, which will prevent extra unnecessary spaces being added:

JavaScript
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement