Skip to content
Advertisement

Matching all three kinds of PHP comments with a regular expression

I need to match all three types of comments that PHP might have:

  • # Single line comment

  • // Single line comment

  • /* Multi-line comments */

  •  

     /**
      * And all of its possible variations
      */
    

Something I should mention: I am doing this in order to be able to recognize if a PHP closing tag (?>) is inside a comment or not. If it is then ignore it, and if not then make it count as one. This is going to be used inside an XML document in order to improve Sublime Text‘s recognition of the closing tag (because it’s driving me nuts!). I tried to achieve this a couple of hours, but I wasn’t able. How can I translate for it to work with XML?

So if you could also include the if-then-else login I would really appreciate it. BTW, I really need it to be in pure regular expression expression, no language features or anything. 🙂

Like Eicon reminded me, I need all of them to be able to match at the start of the line, or at the end of a piece of code, so I also need the following with all of them:

<?php
    echo 'something'; # this is a comment
?>

Advertisement

Answer

Parsing a programming language seems too much for regexes to do. You should probably look for a PHP parser.

But these would be the regexes you are looking for. I assume for all of them that you use the DOTALL or SINGLELINE option (although the first two would work without it as well):

~#[^rn]*~
~//[^rn]*~
~/*.*?*/~s

Note that any of these will cause problems, if the comment-delimiting characters appear in a string or somewhere else, where they do not actually open a comment.

You can also combine all of these into one regex:

~(?:#|//)[^rn]*|/*.*?*/~s

If you use some tool or language that does not require delimiters (like Java or C#), remove those ~. In this case you will also have to apply the DOTALL option differently. But without knowing where you are going to use this, I cannot tell you how.

If you cannot/do not want to set the DOTALL option, this would be equivalent (I also left out the delimiters to give an example):

(?:#|//)[^rn]*|/*[sS]*?*/

See here for a working demo.

Now if you also want to capture the contents of the comments in a group, then you could do this

(?|(?:#|//)([^rn]*)|/*([sS]*?)*/)

Regardless of the type of comment, the comments content (without the syntax delimiters) will be found in capture 1.

Another working demo.

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement