Skip to content
Advertisement

Where can I find xxhash64 and md5 collision probability statistics?

I dont find any info about % of collisions for xxhash64.

I’m going to use it for cache system (to generate hash keys which need to be unique, about a hundreds millions). Now i use md5, but i don’t need cryptographic property.

So i need some info, to decide does is it a good decision for my task. In best case – comparison of the number of collisions between md5 and xxHash64.

Advertisement

Answer

You can calculate yourself by using the birthday problem.

In general the mathematical expression that gives you the probability of hash function is :

p(k) = 1 – exp(-k(k-1)/2N, k (number of hashes) randomly generated values, where each value is a non-negative integer less than N (number of possible hashes):

N = 2^(number of bit), example for md5 it is 2^128, or 2^32 for 32 bit-hash

If you use md5

will produce a 128-bit hash value, by applying this formula you get this ‘S’ graph. This graph explains, for example, in order to get a collison probability of 50% (0.5), you need at least 2100000 trillion of hashes!!!! If you we use less than, for instance 1 billion of hashes, the probability of collision is negligible.

enter image description here

If you are using hundred millions of hashed keys, the probability of collision is 0% using md5.

If you use xxhash64,

Assuming that xxhash64 produce a 64-bit hash. You will get this graph. enter image description here

According to this picture, you can see that if the collision percentage is 50%, you need at least 5 billion of hashes. Two of the 5 billion of hashes can have an odd of 1/2 to have the same hashes!!! If you have around 12 billion of hashes there is 100% of chance that the hashes collide.

If you are using hundred millions of hashed keys, the probability of collision is 0.033% using xxhash64.

This link explains why md5 or fast hash method are not secure.

User contributions licensed under: CC BY-SA
9 People found this is helpful
Advertisement