I need to extract a law text in Portuguese with three parts: prefix, body, meta, something like this:
art. 3º Esta Consolidação estatui (teste 123) as normas que regulam as relações individuais. (abc 123)
PREFIX: "art. 3º" BODY: "Esta Consolidação estatui (teste 123) as normas que regulam as relações individuais." META: "(abc 123)"
I suspect I need something related to look-ahead, but cannot figure it out.
Here it is the regexp:
^([aA]rt. d+º?)(.*(?=(.*)))((.*))?$
Here are the lines that should match:
art. 3º Esta Consolidação estatui as normas que regulam as relações individuais. (modificado pela lei 234/98) art. 3º Esta Consolidação estatui as normas que regulam as relações individuais. art. 3º Esta Consolidação estatui (teste 123) as normas que regulam as relações individuais. art. 3º Esta Consolidação estatui (teste 123) as normas que regulam as relações individuais. (abc 123)
My efforts are in this link: https://regex101.com/r/pPlOkn/3
I need to match all the variations (the four lines):
Advertisement
Answer
The problem with your regex is that the forward lookahead in the middle effectively insists on the string having a (...)
at the end of the line. By removing that lookahead, and changing the optional group at the end so that it can only match (...)
with no intervening )
, it should do what you want:
^([aA]rt. d+º?)s*(.*?)s*(([^)]*))?$