Below is a string I’ve tried to explode only on comma’s outside of the first set of brackets.
Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour
1st Attempt
preg_split("/[[]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);
Which returns:
[0] => Wheat Flour [1] => 2% [2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin [3] => B3 [4] => , Thiamin [5] => B1 [6] => , Ascorbic Acid [7] => , Water, Yeast, Salt, Vegetable Oils [8] => Palm, Rapeseed [9] => , Soya Flour
2nd Attempt
preg_split('/|(?![^(]*))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");
Returns:
[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour
The first attempt is the closest I’ve been able to get to the below output I’m trying to get.
[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]" [1] => "Water" [2] => "Yeast" [3] => "Salt" [4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))" [5] => "Soya Flour"
Advertisement
Answer
You may use this PCRE regex for splitting:
(?:(((?:[^()]*|(?-1))*))|([(?:[^][]*|(?-1))*]))(*SKIP)(*F)|h*,h*
Code:
$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour'; $re = '~(?:(((?:[^()]*|(?-1))*))|([(?:[^][]*|(?-1))*]))(*SKIP)(*F)|h*,h*~'; print_r(preg_split($re, $s));
Output:
Array
(
[0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]
[1] => Water
[2] => Yeast
[3] => Salt
[4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))
[5] => Soya Flour
)
RegEx Explained:
(?:: Start non-capture group(((?:[^()]*|(?-1))*)): Recursive pattern to match a possibly nested(...)substring|: OR([(?:[^][]*|(?-1))*]): Recursive pattern to match a possibly nested[...]substring
):(*SKIP)(*F): Skip and Fail this match i.e. retain this data in split result|: ORh*,h*: Match a comma surrounded with 0 or more whitespaces on either side