So I’ve got a fairly simple function in PHP that renders 10 character long order IDs:
function createReference($length = 10) { $characters = 'ABCDEFGHIJKLMNPQRSTUVWXYZ123456789'; $string = ''; for ($i = 0; $i < $length; $i++) { $string .= $characters[rand(0, strlen($characters) - 1)]; } return $string; }
However, today on the 154020th table record, it generated the same 10-character ID as a previous order ID (which was the 144258th record in the table), and tried to insert it. Since I have a UNIQUE
restriction on the column, I got an error and I received a notification from this.
According to my calculations, the script above creates 34^10 = 2.064.377.754.059.776
different possibilities.
I’ve read some stuff about rand()
and mt_rand()
doing different stuff but that shouldnt be an issue on PHP 7.1+. The script is running on PHP 7.3
.
So should I buy a lottery ticket right now, or is there something predictable about the pseudo-randomness being used here? If so, what is a solution to have better distribution?
Advertisement
Answer
Assuming rand()
is a true RNG, then the expected chance to generate a duplicate reaches 50% after reaching a little more than the square root of all possibilities (see “Birthday problem” for a more precise statement and formulas). The square root of 34^10 is 45435424, so it’s well over 144258, but of course, rand()
is far from being a perfect or “true” RNG.
In any case, generating a unique random identifier using rand
or mt_rand
(rather than a cryptographic RNG such as random_int
) is a bad idea anyway. Depending on whether or not IDs have to be hard to guess, or whether or not the ID alone is enough to grant access to the resource, it may or may not be a better idea to use auto-incrementing record numbers rather than random numbers. See my section “Unique Random Identifiers” for further considerations.
See also this question.