-
Notifications
You must be signed in to change notification settings - Fork 7.3k
ZOOKEEPER-4925: Fix data loss due to propagation of discontinuous committedLog #2254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…mittedLog There are two variants of `ZooKeeperServer::processTxn`. Those two variants diverge significantly since ZOOKEEPER-3484. `processTxn(Request request)` pops outstanding change from `outstandingChanges` and adds txn to `committedLog` for follower to sync in addition to what `processTxn(TxnHeader hdr, Record txn)` does. The `Learner` uses `processTxn(TxnHeader hdr, Record txn)` to commit txn to memory after ZOOKEEPER-4394, which means it leaves `committedLog` untouched in `SYNCHRONIZATION` phase. This way, a stale follower will have hole in its `committedLog` after joining cluster. The stale follower will propagate the in memory hole to other stale nodes after becoming leader. This causes data loss. The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 is affected. This commit drops `processTxn(TxnHeader hdr, Record txn)` as `processTxn(Request request)` is capable in `SYNCHRONIZATION` phase too. Also, this commit rejects discontinuous proposals in `syncWithLeader` and `committedLog`, so to avoid possible data loss. Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484
assertNotNull(zk.exists(path, false), path + " not found"); | ||
} | ||
|
||
for (int i = 0; i < 10; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding the JIRA ticket to help people understand the issue and fix? Basically all the /boo related txns are synced via DIFF and added to the committed logs with the fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fzk.appendRequest(pif.hdr, pif.rec, pif.digest); | ||
fzk.processTxn(pif.hdr, pif.rec); | ||
fzk.appendRequest(pif.toRequest()); | ||
fzk.processTxn(pif.toRequest()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick
How about create the request object and reuse it instead of converting twice?
Request request = pif.toRequest()
fzk.appendRequest(request)
fzk.processTxn(request)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// log at warn should not be too verbose. | ||
LOG.warn("DIFF sync got new epoch proposal 0x{}, last proposal 0x{}", | ||
Long.toHexString(packet.hdr.getZxid()), | ||
Long.toHexString(lastQueued)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same code as line 637-651. How about creating a common method to check if proposal is out of order so it can be used in both places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move them to enforceContinuousProposal
.
} | ||
// We can't tell whether it is a data loss. Given that new epoch is rare, | ||
// log at warn should not be too verbose. | ||
LOG.warn("DIFF sync got new epoch proposal 0x{}, last proposal 0x{}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about keeping the original WARN statement, so it's very clear what is the actual and what is the expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add expected zxid in newly introduced method enforceContinuousProposal
.
ZxidUtils.zxidToString(request.zxid), | ||
ZxidUtils.zxidToString(maxCommittedLog)); | ||
LOG.error(msg); | ||
throw new IllegalStateException(msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test case for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is already covered by existing tests. allowDiscontinuousProposals
(used by ZxidRolloverTest
, TxnLogDigestTest
, both introduce discontinuous proposals on purpose) was introduced to bypass this restriction.
Also this path will fail newly introduced test if we rollback changes involed in sync phase.
git checkout HEAD^ -- zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/
git checkout HEAD^ -- zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java
So, I think it is covered, though not for successful tests.
Besides, this is what I state in commit message.
Also, this commit rejects discontinuous proposals in syncWithLeader and committedLog, so to avoid possible data loss.
if (committedLog.isEmpty()) { | ||
minCommittedLog = request.zxid; | ||
maxCommittedLog = request.zxid; | ||
} else if (request.zxid <= maxCommittedLog) { | ||
// This could happen if lastProcessedZxid is rewinded and database is re-synced. | ||
// Currently, it only happens in test codes, but it should also be safe for production path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a test case for the scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LoadFromLogTest::testRestore
fails without this. Here is the javadoc of the test.
/**
* Test we can restore the snapshot that has data ahead of the zxid
* of the snapshot file.
*/
zookeeper/zookeeper-server/src/test/java/org/apache/zookeeper/test/LoadFromLogTest.java
Lines 178 to 182 in aeadf57
ZooKeeperServer zks = serverFactory.getZooKeeperServer(); | |
long eZxid = zks.getZKDatabase().getDataTreeLastProcessedZxid(); | |
// force the zxid to be behind the content | |
zks.getZKDatabase().setlastProcessedZxid(zks.getZKDatabase().getDataTreeLastProcessedZxid() - 10); | |
LOG.info("Set lastProcessedZxid to {}", zks.getZKDatabase().getDataTreeLastProcessedZxid()); |
The changes for fixing the data loss issue looks good. It's a very critical issue. Thanks for fixing it. The changes for blocking proposal and commit requests being out of order looks good in general. We need to make sure it doesn't introduce any unexpected issue. |
Hi @li4wang, thank for for reviewing this! I have pushed a new commit. Please take another to look and let me know whether it solves your concerns. |
There are two variants of
ZooKeeperServer::processTxn
. Those two variants diverge significantly since ZOOKEEPER-3484.processTxn(Request request)
pops outstanding change fromoutstandingChanges
and adds txn tocommittedLog
for follower to sync in addition to whatprocessTxn(TxnHeader hdr, Record txn)
does. TheLearner
usesprocessTxn(TxnHeader hdr, Record txn)
to commit txn to memory after ZOOKEEPER-4394, which means it leavescommittedLog
untouched inSYNCHRONIZATION
phase.This way, a stale follower will have hole in its
committedLog
after joining cluster. The stale follower will propagate the in memory hole to other stale nodes after becoming leader. This causes data loss.The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 is affected.
This commit drops
processTxn(TxnHeader hdr, Record txn)
asprocessTxn(Request request)
is capable inSYNCHRONIZATION
phase too.Also, this commit rejects discontinuous proposals in
syncWithLeader
andcommittedLog
, so to avoid possible data loss.Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484