Skip to content
Advertisement

Decomposing a string into words separared by spaces, ignoring spaces within quoted strings, and considering ( and ) as words

How can I explode the following string:

+test +word any -sample (+toto +titi "generic test") -column:"test this" (+data id:1234)

into

Array('+test', '+word', 'any', '-sample', '(', '+toto', '+titi', '"generic test"', ')', '-column:"test this"', '(', '+data', 'id:1234', ')')

I would like to extend the boolean fulltext search SQL query, adding the feature to specify specific columns using the notation column:value or column:"valueA value B".

How can I do this using preg_match_all($regexp, $query, $result), i.e., what is the correct regular expression to use?

Or more generally, what would be the most appropriate regular expression to decompose a string into words not containing spaces, where spaces within text between quotes is not considered spaces, for the sake of defining a word, and ( and ) are considered words, independent of being surrounded by spaces. For example xxx"yyy zzz" should be considered a single world. And (aaa) should be three words (, aaa and ).

I have tried something like /"(?:\\.|[^\\"])*"|S+/, but with limited/no success.

Can anybody help?

Advertisement

Answer

I think PCRE verbs can be used to achieve your goal:

preg_split('/".*?"(*SKIP)(*FAIL)|((|))| /', '+test +word any -sampe (+toto +titi "generic test") -column:"test this" (+data id:1234)',-1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY)

https://3v4l.org/QnpB9
https://regex101.com/r/pw1mEd/1
https://3v4l.org/dNMkf (with test data)

User contributions licensed under: CC BY-SA
1 People found this is helpful
Advertisement