Below is a string I’ve tried to explode only on comma’s outside of the first set of brackets.
Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour
1st Attempt
preg_split("/[[]|()]+/", "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour", -1, PREG_SPLIT_NO_EMPTY);
Which returns:
[0] => Wheat Flour [1] => 2% [2] => Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin [3] => B3 [4] => , Thiamin [5] => B1 [6] => , Ascorbic Acid [7] => , Water, Yeast, Salt, Vegetable Oils [8] => Palm, Rapeseed [9] => , Soya Flour
2nd Attempt
preg_split('/|(?![^(]*))/', "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour");
Returns:
[0] => Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed), Soya Flour
The first attempt is the closest I’ve been able to get to the below output I’m trying to get.
[0] => "Wheat Flour (2%) [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid]" [1] => "Water" [2] => "Yeast" [3] => "Salt" [4] => "Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed))" [5] => "Soya Flour"
Advertisement
Answer
You may use this PCRE regex for splitting:
(?:(((?:[^()]*|(?-1))*))|([(?:[^][]*|(?-1))*]))(*SKIP)(*F)|h*,h*
Code:
$s = 'Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid], Water, Yeast, Salt, Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)), Soya Flour'; $re = '~(?:(((?:[^()]*|(?-1))*))|([(?:[^][]*|(?-1))*]))(*SKIP)(*F)|h*,h*~'; print_r(preg_split($re, $s));
Output:
Array ( [0] => Wheat Flour [Wheat Flour, Wheat Gluten, Calcium Carbonate, Iron, Niacin (B3), Thiamin (B1), Ascorbic Acid] [1] => Water [2] => Yeast [3] => Salt [4] => Vegetable Oils (Palm, Rapeseed, oils (sunflower, rapeseed)) [5] => Soya Flour )
RegEx Explained:
(?:
: Start non-capture group(((?:[^()]*|(?-1))*))
: Recursive pattern to match a possibly nested(...)
substring|
: OR([(?:[^][]*|(?-1))*])
: Recursive pattern to match a possibly nested[...]
substring
)
:(*SKIP)(*F)
: Skip and Fail this match i.e. retain this data in split result|
: ORh*,h*
: Match a comma surrounded with 0 or more whitespaces on either side