Please help with the regex, the language can be any. I’ll later translate it to python.
I’m trying to build a regex to capture the tag below:
#Facilitator:"Full Name <mail@mail.domain>"
- Full name can be with accents like José, Pâmela, or any available in the ASCII table.
- Full name can have 1, 2 or n family names. Could have or not a ‘(comapny name)’ at the end of the name: like
#Facilitator:"Name1 Name2 Name3 (Company Inc) <mail@domain>"
- The tag can appear 0, 1 or n times in strings.
- The tag can appear in any place of the string.
So far trying like this (python) but no success:
import re notes = 'Verbal confirmation #Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope@yahoo.com>"from ATUX with Melanie. Waiting for scheduling#Facilitator:"Fernandes <v-rrlo@stttr.de>" #Facilitator:"Pablito Ferdinandes <papa@gmail.com>"' facilitator_regex = '^.*((#Facilitator:".*"){1,}).*$' regex_replace = '\1' print(re.sub(facilitator_regex, regex_replace, notes))
The output i expect is a list of 0, 1 or more #tags separated by a space.
Any help on any language? I need help mostly with the regex itself. thanks so much.
Advertisement
Answer
You can find all the facilitators using re.findall
with this regex:
'#Facilitator:"[^"]*"'
e.g.
facilitator_regex = '#Facilitator:"[^"]*"' facilitators = re.findall(facilitator_regex, notes)
For your sample data this gives
[ '#Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope@yahoo.com>"', '#Facilitator:"Fernandes <v-rrlo@stttr.de>"', '#Facilitator:"Pablito Ferdinandes <papa@gmail.com>"' ]
You could then use str.join
to make a space-separated list:
print(' '.join(facilitators))
Output:
#Facilitator:"Fernas P. Loyola (YARDA LTDA) <ope@yahoo.com>" #Facilitator:"Fernandes <v-rrlo@stttr.de>" #Facilitator:"Pablito Ferdinandes <papa@gmail.com>"