I have the following list of strings:
$list = array( 'c1' => '{sometext...} 1tb hdd 1tb hdd {sometext...}' 'c2' => '{sometext...} 1tb hdd 1tb {sometext...}', 'c3' => '{sometext...} hdd 1tb hdd 1tb {sometext...}', 'c4' => '{sometext...} hdd 1tb hdd 1tb hdd {sometext...}' );
and the following regular expression which should run on all strings, and if a match is found, return true
otherwise, return false
.
/(?<!hdds)(dtb hdd dtb){1,}(?!shdd)/
As of now, my result set looks something like this:
'c1' => false, 'c2' => true, 'c3' => false, 'c4' => false
However, for the correct result, would be, to mark c4
as true
instead. How could I change my regex, to achieve the desired result?
USE CASE: the use case for this would be, to correctly identify ambigous attributes in product title naming. In case1 and case3, it is easily decidable which capacity belongs to which storage device, however in the other two cases, it is not programatically decidable, because there is a hdd
without a capacity value.
NOTE: Counting the number of hdd
instances in the string is not a good solution, as in the {sometext...}
part of the string, other instances of the text may appear as different kind of noise.
Advertisement
Answer
You can use
(?<=(hdds)|)dtb hdd dtb(?(1)(?=shdd)|(?!shdd)) (?:hdds+dtb hdd dtb(?!s+hdd)|(?<!hdds)dtb hdd dtbs+hdd)(*SKIP)(*F)|dtb hdd dtb
See the regex demo #1 and regex demo #2.
Details #1:
(?<=(hdds)|)
– checks if there ishdd
+whitespace (captured into Group 1) or empty string immediately to the left of the current locationdtb hdd dtb
– matches digit +tb hdd
+ digit +tb
(?(1)(?=shdd)|(?!shdd))
– if Group 1 value is not null, make sure there is a whitespace andhdd
immediately to the right of the current location, else, makes sure this pattern cannot be found at the same location.
Details #2:
(?:hdds+dtb hdd dtb(?!s+hdd)|(?<!hdds)dtb hdd dtbs+hdd)(*SKIP)(*F)
– matches thehdds+dtb hdd dtb
pattern that is not immediately followed with 1+ whitespaces +hdd
or adtb hdd dtbs+hdd
that is not immediately preceded withhdd
+ whitespace, fails these matches and goes on to search for the next match from the failure location|
– ordtb hdd dtb
– matches digit,tb hdd
, digit,tb
.
See the PHP demo:
$list = array( 'c1' => '{sometext...} 1tb hdd 1tb hdd {sometext...}', 'c2' => '{sometext...} 1tb hdd 1tb {sometext...}', 'c3' => '{sometext...} hdd 1tb hdd 1tb {sometext...}', 'c4' => '{sometext...} hdd 1tb hdd 1tb hdd {sometext...}' ); print_r(preg_grep('~(?<=(hdds)|)dtb hdd dtb(?(1)(?=shdd)|(?!shdd))~', $list)); // => Array // ( // [c2] => {sometext...} 1tb hdd 1tb {sometext...} // [c4] => {sometext...} hdd 1tb hdd 1tb hdd {sometext...} // )