Skip to content
Advertisement

How can I extract values that have opening and closing brackets with regular expression?

I am trying to extract [[String]] with regular expression. Notice how a bracket opens [ and it needs to close ]. So you would receive the following matches:

  • [[String]]
  • [String]
  • String

If I use [[^]]+] it will just find the first closing bracket it comes across without taking into consideration that a new one has opened in between and it needs the second close. Is this at all possible with regular expression?

Note: This type can either be String, [String] or [[String]] so you don’t know upfront how many brackets there will be.

Advertisement

Answer

You can use the following PCRE compliant regex:

(?=(([(?:w++|(?2))*])|bw+))

See the regex demo. Details:

  • (?= – start of a positive lookahead (necessary to match overlapping strings):
    • (– start of Capturing group 1 (it will hold the “matches”):
      • ([(?:w++|(?2))*]) – Group 2 (technical, used for recursing): [, then zero or more occurrences of one or more word chars or the whole Group 2 pattern recursed, and then a ] char
      • | – or
      • bw+ – a word boundary (necessary since all overlapping matches are being searched for) and one or more word chars
    • ) – end of Group 1
  • ) – end of the lookahead.

See the PHP demo:

$s = "[[String]]";
if (preg_match_all('~(?=(([(?:w++|(?2))*])|bw+))~', $s, $m)){
    print_r($m[1]);
}

Output:

Array
(
    [0] => [[String]]
    [1] => [String]
    [2] => String
)
User contributions licensed under: CC BY-SA
5 People found this is helpful
Advertisement