I dont find any info about % of collisions for xxhash64.
I’m going to use it for cache system (to generate hash keys which need to be unique, about a hundreds millions). Now i use md5, but i don’t need cryptographic property.
So i need some info, to decide does is it a good decision for my task. In best case – comparison of the number of collisions between md5 and xxHash64.
Advertisement
Answer
You can calculate yourself by using the birthday problem.
In general the mathematical expression that gives you the probability of hash function is :
p(k) = 1 – exp(-k(k-1)/2N, k (number of hashes) randomly generated values, where each value is a non-negative integer less than N (number of possible hashes):N = 2^(number of bit), example for md5 it is 2^128, or 2^32 for 32 bit-hash
If you use md5
will produce a 128-bit hash value, by applying this formula you get this ‘S’ graph. This graph explains, for example, in order to get a collison probability of 50% (0.5), you need at least 2100000 trillion of hashes!!!! If you we use less than, for instance 1 billion of hashes, the probability of collision is negligible.
If you are using hundred millions of hashed keys, the probability of collision is 0% using md5.
If you use xxhash64,
Assuming that xxhash64 produce a 64-bit hash. You will get this graph.
According to this picture, you can see that if the collision percentage is 50%, you need at least 5 billion of hashes. Two of the 5 billion of hashes can have an odd of 1/2 to have the same hashes!!! If you have around 12 billion of hashes there is 100% of chance that the hashes collide.
If you are using hundred millions of hashed keys, the probability of collision is 0.033% using xxhash64.
This link explains why md5 or fast hash method are not secure.