Skip to content
Advertisement

regex php separate an exact word from string in diffrent groups

i have tried all i know but still can’t figure out how to resolve this problem :

i have a string ex :

  • "--included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees"
  • "--not included-- in selling price: us$ 35.00 express fees 2 % notifying fees"

i want to know if the taxes are “included” or “excluded” and if the fees are “%” or “currency” the problem is it doesn’t detect the currency “usd” while it’s attached to the taxe name “vat usd”

how can i separate the currency from the name of the taxe in different groups.

here is what i did

(--excluded--|--included--|--not included--)([a-z ]*)?:?(usd | aed | mad | € | us$ )?([ . 0-9 ]*)(%)?([a-z A-z ?]*) (aed|mad|€|us$)*((aed|mad|€|us$)+)?([. 0-9 ]*)(%)?([a-z A-z]*)(.*)?

and here is what i got

Match 1
Full match  0-83    --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees

Group 1.    0-12    --included--

Group 2.    12-29    in selling price

Group 4.    30-33    5 

Group 5.    33-34   %

Group 6.    34-42    vat usd

Group 10.   43-49   10.00 

Group 12.   49-64   packaging fees 

Group 13.   64-82   2 % notifying fees

and here is what i want

Match 1
Full match  0-83    --included-- in selling price: 5 % vat usd 10.00 packaging fees 2 % notifying fees

Group 1.    0-12    --included--

Group 2.    12-29    in selling price

Group 4.    30-33    5 

Group 5.    33-34   %

Group 6.    34-38    vat

Group 7.    38-42    usd

Group 10.   43-49   10.00 

Group 12.   49-64   packaging fees 

Group 13.   64-82   2 % notifying fees

Advertisement

Answer

Here is the solution:

$s = "--included-- in product price: breakfast --excluded--: 5 % vat aed 10.00 destination fee per night 2 % municipality fee 3.5 % packaging fee 10 % warranty service charge";
$results = [];
if (preg_match_all('~(--(?:(?:not )?in|ex)cluded--)(?:s+([a-zA-Z ]+))?:+s*((?:(?!--(?:(?:not )?in|ex)cluded--).)*)~su', $s, $m, PREG_SET_ORDER, 0)) {
    foreach ($m as $v) {
        $lastline=array_pop($v); // Remove last item //print_r($details);
        if (preg_match_all('~(?:(b(?:usd|aed|mad|usd)b|B€|bus$)s*)?d+(?:.d+)?(?:(?!(?1))D)*~ui', $lastline, $details)) {
            $results[] = array_merge($v, $details[0]);
        } else {
            $results[] = $v;
        }
    }
}
print_r($results);

See the PHP demo.

Notes:

The first regex extracts each match you need to parse. See the first regex demo. It means:

  • (--(?:(?:not )?in|ex)cluded--) – Group 1: a shorter version of (--excluded--|--included--|--not included--): --excluded--, --included-- or --not included--
  • (?:s+([a-zA-Z ]+))? – an optional sequence: 1+ whitespaces and then Group 2: 1+ ASCII letters or spaces
  • :+ – 1 or more colons
  • s* – 0+ whitespaces
  • ((?:(?!--(?:(?:not )?in|ex)cluded--).)*) – Group 3: any char, 0+ occurrences, as many as possible, not starting any of the three char sequences: --excluded--, --included--, --not included--

Then, the Group 3 value needs to be further parsed to grab all the details. The second regex is used here to match

  • (?:(b(?:usd|aed|mad|usd)b|B€|bus$)s*)? – an optional occurrence of
    • (b(?:usd|aed|mad|usd)b|B€|bus$) – Group 1:
      • b(?:usd|aed|mad|usd)busd, aed, mad or usd as whole words
      • B€ not preceded with a word char
      • bus$us$ not preceded with a word char
    • s* – 0+ whitespaces
  • d+
  • (?:.d+)? – an optional sequence of . and 1+ digits
  • (?:(?!(?1))D)* – any non-digit char, 0 or more occurrences, as many as possible, not starting the same pattern as in Group 1
User contributions licensed under: CC BY-SA
10 People found this is helpful
Advertisement