I have some data that I want to store in memcached (using the PHP libmemcached client: https://www.php.net/manual/en/intro.memcached.php). It’s something that’s hit extremely frequently across my web app.
In order to reduce the amount of traffic to a single memcached node I append a random number between 1 and 10 to the end of the key, in hopes that the client will not store all of the keys on a single node.
I had assumed that the process of assigning a key was random, but across 15 nodes at least half of the keys went to the same node. That makes me think there is something more deterministic about how it decides which node to use for a given key.
Does anyone know how it’s done?
Advertisement
Answer
It uses a hash. In the simplest form, imagine if you run a hash function like MD5 on a key, you can use the first byte to figure out which server it should go to.
This is important, because if 2 servers talk to multiple memcached servers they need to reliable all select the same server for the same key. Random would be bad because it means that a client might try to get()
from a different server where the item was stored.
If you have 15 nodes and more than half of the items were stored in 1 node, you are either: 1. Extremely unlucky or 2. Something is not configured right and some of your servers are marked offline.
The underlying hash is more complex than a simple ‘md5’, it uses a ‘consistent hashing’ algorithm. What this means is if you have 15 nodes, and lose 1, most keys will still resolve to the same server. There’s lengthy articles about ‘consistent hashing’ so it should be easy to get technical details.