Skip to content
Advertisement

RegEx (preg_match_all in PHP) to capture series of up to the first alphanumeric character

The problem here is the conflict between numbers and alphanumeric in the problem description.

Given the text:

<0><1><2><3><4><5><6><7><8><9><10><11><12><13><14><15><16><17><18>The next 11 keys can change the SWING from OFF (50%) to <19><20><21><22><23><24><25>80<26><27><28><29><30><31><32>% during arpeggiator or sequencer operation.<33><34>

I need to extract the following four groups:

JavaScript

Reason: we want to display this in a much more user-friendly way as…

[1]The next 11 keys can change the SWING from OFF (50%) to [2]80[3]% during arpeggiator or sequencer operation.[4]

Current code:

JavaScript

(REGSTART/REGEND/REGSTARTSQ/REGENDSQ refer to other possible pairs of symbols, like 【】 or 〖〗 etc.)

gives three groups:

JavaScript

As you can see, the RegEx fails to take into account sequences of only numbers between tags.

I’ve tried lots of things:

JavaScript

…but to no avail.

What is the correct solution and where do I go wrong? This looks really simple, but apparently it isn’t.

Advertisement

Answer

You can use

JavaScript

See the regex demo. Details:

  • (?: – start of a non-capturing group:
    • – a char
    • (?:{d+}|【d+】|〖d+〗|d+) – one of the alternatives: { + one or more digits + }, + one or more digits + , + one or more digits + or one or more digits
    • – a char
  • )+ – one or more times.

See the PHP demo:

JavaScript
User contributions licensed under: CC BY-SA
8 People found this is helpful
Advertisement