I have the following list of strings:
$list = array(
'c1' => '{sometext...} 1tb hdd 1tb hdd {sometext...}'
'c2' => '{sometext...} 1tb hdd 1tb {sometext...}',
'c3' => '{sometext...} hdd 1tb hdd 1tb {sometext...}',
'c4' => '{sometext...} hdd 1tb hdd 1tb hdd {sometext...}'
);
and the following regular expression which should run on all strings, and if a match is found, return true otherwise, return false.
/(?<!hdds)(dtb hdd dtb){1,}(?!shdd)/
As of now, my result set looks something like this:
'c1' => false, 'c2' => true, 'c3' => false, 'c4' => false
However, for the correct result, would be, to mark c4 as true instead. How could I change my regex, to achieve the desired result?
USE CASE: the use case for this would be, to correctly identify ambigous attributes in product title naming. In case1 and case3, it is easily decidable which capacity belongs to which storage device, however in the other two cases, it is not programatically decidable, because there is a hdd without a capacity value.
NOTE: Counting the number of hdd instances in the string is not a good solution, as in the {sometext...} part of the string, other instances of the text may appear as different kind of noise.
Advertisement
Answer
You can use
(?<=(hdds)|)dtb hdd dtb(?(1)(?=shdd)|(?!shdd)) (?:hdds+dtb hdd dtb(?!s+hdd)|(?<!hdds)dtb hdd dtbs+hdd)(*SKIP)(*F)|dtb hdd dtb
See the regex demo #1 and regex demo #2.
Details #1:
(?<=(hdds)|)– checks if there ishdd+whitespace (captured into Group 1) or empty string immediately to the left of the current locationdtb hdd dtb– matches digit +tb hdd+ digit +tb(?(1)(?=shdd)|(?!shdd))– if Group 1 value is not null, make sure there is a whitespace andhddimmediately to the right of the current location, else, makes sure this pattern cannot be found at the same location.
Details #2:
(?:hdds+dtb hdd dtb(?!s+hdd)|(?<!hdds)dtb hdd dtbs+hdd)(*SKIP)(*F)– matches thehdds+dtb hdd dtbpattern that is not immediately followed with 1+ whitespaces +hddor adtb hdd dtbs+hddthat is not immediately preceded withhdd+ whitespace, fails these matches and goes on to search for the next match from the failure location|– ordtb hdd dtb– matches digit,tb hdd, digit,tb.
See the PHP demo:
$list = array(
'c1' => '{sometext...} 1tb hdd 1tb hdd {sometext...}',
'c2' => '{sometext...} 1tb hdd 1tb {sometext...}',
'c3' => '{sometext...} hdd 1tb hdd 1tb {sometext...}',
'c4' => '{sometext...} hdd 1tb hdd 1tb hdd {sometext...}'
);
print_r(preg_grep('~(?<=(hdds)|)dtb hdd dtb(?(1)(?=shdd)|(?!shdd))~', $list));
// => Array
// (
// [c2] => {sometext...} 1tb hdd 1tb {sometext...}
// [c4] => {sometext...} hdd 1tb hdd 1tb hdd {sometext...}
// )