Just starting to explore the ‘wonders’ of regex. Being someone who learns from trial and error, I’m really struggling because my trials are throwing up a disproportionate amount of errors… My experiments are in PHP using ereg().
Anyway. I work with first and last names separately but for now using the same regex. So far I have:
^[A-Z][a-zA-Z]+$
Any length string that starts with a capital and has only letters (capital or not) for the rest. But where I fall apart is dealing with the special situations that can pretty much occur anywhere.
- Hyphenated Names (Worthington-Smythe)
- Names with Apostophies (D’Angelo)
- Names with Spaces (Van der Humpton) – capitals in the middle which may or may not be required is way beyond my interest at this stage.
- Joint Names (Ben & Jerry)
Maybe there’s some other way a name can be that I’m no thinking of, but I suspect if I can get my head around this, I can add to it. I’m pretty sure there will be instances where more than one of these situations comes up in one name.
So, I think the bottom line is to have my regex also accept a space, hyphens, ampersands and apostrophes – but not at the start or end of the name to be technically correct.
Advertisement
Answer
- Hyphenated Names (Worthington-Smythe)
Add a – into the second character class. The easiest way to do that is to add it at the start so that it can’t possibly be interpreted as a range modifier (as in a-z
).
^[A-Z][-a-zA-Z]+$
- Names with Apostophies (D’Angelo)
A naive way of doing this would be as above, giving:
^[A-Z][-'a-zA-Z]+$
Don’t forget you may need to escape it inside the string! A ‘better’ way, given your example might be:
^[A-Z]'?[-a-zA-Z]+$
Which will allow a possible single apostrophe in the second position.
- Names with Spaces (Van der Humpton) – capitals in the middle which may or may not be required is way beyond my interest at this stage.
Here I’d be tempted to just do our naive way again:
^[A-Z]'?[- a-zA-Z]+$
A potentially better way might be:
^[A-Z]'?[- a-zA-Z]( [a-zA-Z])*$
Which looks for extra words at the end. This probably isn’t a good idea if you’re trying to match names in a body of extra text, but then again, the original wouldn’t have done that well either.
- Joint Names (Ben & Jerry)
At this point you’re not looking at single names anymore?
Anyway, as you can see, regexes have a habit of growing very quickly…