i have tried all i know but still can’t figure out how to resolve this problem :
i have a string ex :
"--included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees""--not included-- in selling price: us$ 35.00 express fees 2 % notifying fees"
i want to know if the taxes are “included” or “excluded” and if the fees are “%” or “currency” the problem is it doesn’t detect the currency “usd” while it’s attached to the taxe name “vat usd”
how can i separate the currency from the name of the taxe in different groups.
here is what i did
(--excluded--|--included--|--not included--)([a-z ]*)?:?(usd | aed | mad | € | us$ )?([ . 0-9 ]*)(%)?([a-z A-z ?]*) (aed|mad|€|us$)*((aed|mad|€|us$)+)?([. 0-9 ]*)(%)?([a-z A-z]*)(.*)?
and here is what i got
Match 1 Full match 0-83 --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees Group 1. 0-12 --included-- Group 2. 12-29 in selling price Group 4. 30-33 5 Group 5. 33-34 % Group 6. 34-42 vat usd Group 10. 43-49 10.00 Group 12. 49-64 packaging fees Group 13. 64-82 2 % notifying fees
and here is what i want
Match 1 Full match 0-83 --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees Group 1. 0-12 --included-- Group 2. 12-29 in selling price Group 4. 30-33 5 Group 5. 33-34 % Group 6. 34-38 vat Group 7. 38-42 usd Group 10. 43-49 10.00 Group 12. 49-64 packaging fees Group 13. 64-82 2 % notifying fees
Advertisement
Answer
Here is the solution:
$s = "--included-- in product price: breakfast --excluded--: 5 % vat aed 10.00 destination fee per night 2 % municipality fee 3.5 % packaging fee 10 % warranty service charge";
$results = [];
if (preg_match_all('~(--(?:(?:not )?in|ex)cluded--)(?:s+([a-zA-Z ]+))?:+s*((?:(?!--(?:(?:not )?in|ex)cluded--).)*)~su', $s, $m, PREG_SET_ORDER, 0)) {
foreach ($m as $v) {
$lastline=array_pop($v); // Remove last item //print_r($details);
if (preg_match_all('~(?:(b(?:usd|aed|mad|usd)b|B€|bus$)s*)?d+(?:.d+)?(?:(?!(?1))D)*~ui', $lastline, $details)) {
$results[] = array_merge($v, $details[0]);
} else {
$results[] = $v;
}
}
}
print_r($results);
See the PHP demo.
Notes:
The first regex extracts each match you need to parse. See the first regex demo. It means:
(--(?:(?:not )?in|ex)cluded--)– Group 1: a shorter version of(--excluded--|--included--|--not included--):--excluded--,--included--or--not included--(?:s+([a-zA-Z ]+))?– an optional sequence: 1+ whitespaces and then Group 2: 1+ ASCII letters or spaces:+– 1 or more colonss*– 0+ whitespaces((?:(?!--(?:(?:not )?in|ex)cluded--).)*)– Group 3: any char, 0+ occurrences, as many as possible, not starting any of the three char sequences:--excluded--,--included--,--not included--
Then, the Group 3 value needs to be further parsed to grab all the details. The second regex is used here to match
(?:(b(?:usd|aed|mad|usd)b|B€|bus$)s*)?– an optional occurrence of(b(?:usd|aed|mad|usd)b|B€|bus$)– Group 1:b(?:usd|aed|mad|usd)b–usd,aed,madorusdas whole wordsB€–€not preceded with a word charbus$–us$not preceded with a word char
s*– 0+ whitespaces
d+(?:.d+)?– an optional sequence of.and 1+ digits(?:(?!(?1))D)*– any non-digit char, 0 or more occurrences, as many as possible, not starting the same pattern as in Group 1