Skip to content
Advertisement

Regex for splitting apparel sizes

I have the following input (only for example, real input contains much more crazy data)

$values = [
    '32/34, 36/38, 40/42, 44/46',
    '40/42/44/46/48',
    '58/60',
    '39-42',
    '40-50-60',
    '24-25,26,28,30',
    '36 40,5 44',
];

and want to split it by separators like / or , but keep pairs of values. This should be done only, if separator does not occur multiple times, so the result should look like:

'32/34, 36/38, 40/42, 44/46'
    => [ '32/34', '36/38', '40/42', '44/46' ]
'40/42/44/46/48'
    => [ '40', '42', '44', '46', '48' ]
'58/60'
    => [ '58/60' ]
'39-42'
    => [ '39-42' ]
'40-50-60'
    => [ '40', '50', '60' ]
'24-25,26,28,30'
    => [ '24-25', '26', '28', '30' ]
'36 40,5 44'
    => [ '36', '40,5', '44' ]

What I have so far is

$separator = '^|$|[s,/-]';
$decimals = 'd+(?:[,.][05])?';
foreach ($values as $value) {
    preg_match_all('/' .
        '(?<=' . $separator . ')' .
        '(?:' .
            '(?P<var1>(' . $decimals . ')[/-](?-1)|(?-1))' .
        ')(?=' . $separator . ')' .
    '/ui', $value, $matches);
    print_r($matches);
}

But this fails for 40/42/44/46/48 which returns

[var1] => Array
    (
        [0] => 40/42
        [1] => 44/46
        [2] => 48
    )

But each number should be returned separately. Modifying regex to '(?P<var1>(' . $decimals . ')([/-])(?-2)|(?-2))(?!3)' is better, but still returns wrong result

[var1] => Array
    (
        [0] => 40
        [1] => 42
        [2] => 44
        [3] => 46/48
    )

How should the correct regex look like?

Advertisement

Answer

As stated in comments above, I know that a 100% match is not possible, because of user input. But I’ve found a regex which fits most of my use cases:

(?<=^|$|[s,/-])(?:(?P<var1>(?<![/-])(?!(?:(d+(?:[,.][05])?)[/-]){2}(?-1))(d+(?:[,.][05])?)[/-](?-1)|(?-1)))(?=^|$|[s,/-])

See https://regex101.com/r/q3YSa7/1

User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement