I’m making some site which must be fully unicode. Database etc are working, i only have some small logic error. Im testing my register form with ajax if fields are valid, in email field i check with regular expressions.
However if a user has a email address like 日本人@日人日本人.com it isn’t coming trough.
- This type of mail addresses exist?
Are email addresses always like this? (a-z A-Z 0-9) @ (a-z A-Z 0-9).(a-z A-Z 0-9)
Advertisement
Answer
As per RFC 5322 (“Internet Message Format”), section 3.4.1 (“Addr-Spec Specification”) you can’t use non US-ASCII characters such as those you’ve listed. However, characters such as…
! # $ % & ' * + - / = ? ^ _ { | } ~
…are legal, as well as the full stop/period character as long as there’s only one in a row.
For more information see the above RFC and indeed the Wikipedia article on email addresses, specifically the “syntax” section.
UPDATE
There’s also a newer, albeit experimental, RFC 5336 (now obsoleted by RFC6531) which handles the now legitimate international domains containing UTF-8 characters, etc.