The problem here is the conflict between numbers and alphanumeric in the problem description.
Given the text:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The next 11 keys can change the SWING from OFF (50%) to <19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during arpeggiator or sequencer operation.<33><34>
I need to extract the following four groups:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18> <19><20><21><22><23><24><25> <26><27><28><29><30><31><32> <33><34>
Reason: we want to display this in a much more user-friendly way as…
[1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]% during arpeggiator or sequencer operation.[4]
Current code:
$pattern = '<[d<>' . REGSTART . REGEND . REGSTARTSQ . REGENDSQ . '{}]+>'; $numberofsupertags = preg_match_all('/(' . $pattern . ')/', $source, $superchunks); echo '<pre>'; print_r($superchunks); echo '</pre><br>';
(REGSTART/REGEND/REGSTARTSQ/REGENDSQ refer to other possible pairs of symbols, like 【】 or 〖〗 etc.)
gives three groups:
<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18> <19><20><21><22><23><24><25>80<26><27><28><29><30><31><32> <33><34>
As you can see, the RegEx fails to take into account sequences of only numbers between tags.
I’ve tried lots of things:
$pattern = '([<|' . REGSTART . REGSTARTSQ . '|{]d+?[>|' . REGEND . REGENDSQ . | }])+'; $pattern = '<[d<>' . REGSTART . REGEND . REGSTARTSQ . REGENDSQ . '{}]+[>(?=d)|>]';
…but to no avail.
What is the correct solution and where do I go wrong? This looks really simple, but apparently it isn’t.
Advertisement
Answer
You can use
(?:<(?:{d+}|【d+】|〖d+〗|d+)>)+
See the regex demo. Details:
(?:
– start of a non-capturing group:<
– a<
char(?:{d+}|【d+】|〖d+〗|d+)
– one of the alternatives:{
+ one or more digits +}
,【
+ one or more digits +】
,〖
+ one or more digits +〗
or one or more digits>
– a>
char
)+
– one or more times.
See the PHP demo:
$source = '<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The next 11 keys can change the SWING from OFF (50%) to <19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during arpeggiator or sequencer operation.<33><34>'; $cnt = 0; echo preg_replace_callback('~(?:<(?:{d+}|【d+】|〖d+〗|d+)>)+~u', function($m) use (&$cnt) { return '['. ++$cnt .']'; }, $source); // => [1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]% during arpeggiator or sequencer operation.[4]