-
Notifications
You must be signed in to change notification settings - Fork 68
Description
I think there are two major causes of confusion when it comes to "beating CAP." Number one is the fault model that Gilbert and Lynch assume. (Number two is the confusion over "application-level" versus "CAP" consistency, as in #4.)
I've recently seen several discussions of CAP (even in academic publications) that discuss the availability requirement. Neither of these is actually Gilbert and Lynch HA, but, for their relaxed failure domain, guarantee a response. Here are two examples:
- "up to F server faults": If you can contact a majority of servers, you can get a response. Not "HA" as minority servers may be partitioned. The HyperDex paper, Section 8 states this assumption rather clearly, noting it is "thus able to provide seemingly impossible guarantees."
- "for specific fault model[s]": If we provision networks appropriately, and partitions never happen, there many be no partitions! The Windows Azure Storage paper, Section 8 discusses this. It's stronger than the asynchronous model and is not HA (I'll not speculate as to how realistic this is, but the paper is fairly adamant that the system circumvents CAP.)
I'm not quite sure how to best address these in the text, but it might be useful.
Two concrete suggestions:
Under "15. Is a failed machine the same as a partitioned one?" the FAQ could mention that, in an HA system, a minority partitioned server still needs to guarantee a response.
Under "12. Is my network really asynchronous?" the FAQ could mention that, in the limit, failures can render any communication network asynchronous.
Alternatively, (at the risk of starting a "list of shame"), the FAQ might expand "17. Have I 'got around' or 'beaten' the CAP theorem?" into a "list of common fallacies" like those above.
I'm curious what you think and am happy to drop a pull request if there's interest.