Skip to content
Advertisement

PHP regular expression

What is purpose of the following code?

preg_replace( '@<(script|style)[^>]*?>.*?</\1>@si', '', $string );

What kind of $string match this expression?

Why there is a character @?

Advertisement

Answer

That regular expression will match any <script>...</style> or <style>...</style> (X)HTML blocks in the string and remove them. This is most likely done to prevent users from inserting these (potentially harmful) tags into data that you might echo back to the user. If not removed, they could allow malicious users to change your site appearance, or insert javascript into your site that rewrites your page content; they might even force you users to visit other websites automatically and many other nasty things.

As for the @…. When defining regular expressions, they are traditionally enclosed by slash for example:

/regexphere/si

The / around the regular expression indicates its boundaries and characters trailing the second slash there are flags to the regular expression engine to behave a certain way. In particular the i means “case insensitive” and the s means that the . in the expression should match whitespace like newlines and tabs. This format was inherited by PHP from Perl and other unix utilities that predate it.

Other characters (like @ or | or %) can be used to replace the / around the regular expression though to avoid unnecessary escaping when there are a lot of /s in your pattern. For example, it’s easier and more readable to write @http://@ than /http:///. In your pattern it makes it slightly easier to not escape the / in the closing tag.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement