Consistent hashing
- Context and problem
Suppose we have n computers and want to distribute some data on those. By using naive hashing method, we may come up with a very simple solution hash(data) mode n to determine on which computer we should select for storing.
In case we add an extra computer, then nearly all of stored data n/(n+1) have to be reallocated => very expensive
- Solution - Consisten Hash
- Hash the number of computer and the data onto a circle.
- Start moving clockwise in the circle, put the data to the next computer
- In case a computer is down, just move those data to next one => only a fraction roughly 1/n of all of data need to be moved
Rendezvous hashing
- Context and problem
How can a set of clients agree on where in a set of sites (servers) to place a data?
- Solution - Highest Random Weight (HRW) Hashing
Each client computes the hash weights w1=hash(site1, data), w2=hash(site2, data), ...., wn=hash(siten, data) and select the site which yields the highest weight. If a site fails, the client just picks the site which yields the next highest value.
Note: In consistent hash method, if a server fails, all of its data will be moved to the next one => overload. Meanwhile, the HRW does not suffer this issue since its mechanism (using hash to determine the site) is random.
Tree of Caches - Harvest Cache
Cache is organized hierarchically. A user obtains a page by asking a nearby leaf cache. If neither this cache nor its siblings have the page, the request is forwarded to the cache's parent. If a page is stored by no cache in the tree, the request eventually reaches the root and is forwarded to the home site of the page.
Chord - P2P
In spite of loving to research the topics related to P2P area, I have no deep understanding the Chord algorithm. So I decided to learn it naturally, and beginning with a very short sentence: Chord specifies how keys are assigned to nodes and how a node can discover the value for a given key by first locating the node responsible for that key.