Skip to content
Advertisement

Is it wrong to use a hash for a unique ID?

I want to use a unique ID generated by PHP in a database table that will likely never have more than 10,000 records. I don’t want the time of creation to be visible or use a purely numeric value so I am using:

sha1(uniqid(mt_rand(), true))

Is it wrong to use a hash for a unique ID? Don’t all hashes lead to collisions or are the chances so remote that they should not be considered in this case?

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won’t it always be unique?

Advertisement

Answer

If you have 2 keys you will have a theoretical best case scenario of 1 in 2 ^ X probability of a collision, where X is the number of bits in your hashing algorithm. ‘Best case’ because the input usually will be ASCII which doesn’t utilize the full charset, plus the hashing functions do not distribute perfectly, so they will collide more often than the theoretical max in real life.

To answer your final question:

A further point: if the number of characters to be hashed is less than the number of characters in a sha1 hash, won’t it always be unique?

Yeah that’s true-sorta. But you would have another problem of generating unique keys of that size. The easiest way is usually a checksum, so just choose a large enough digest that the collision space will be small enough for your comfort.

As @wayne suggests, a popular approach is to concatenate microtime() to your random salt (and base64_encode to raise the entropy).

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement