Skip to content

ZOOKEEPER-4925: Fix data loss due to propagation of discontinuous committedLog #2254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

kezhuw
Copy link
Member

@kezhuw kezhuw commented May 4, 2025

There are two variants of ZooKeeperServer::processTxn. Those two variants diverge significantly since ZOOKEEPER-3484. processTxn(Request request) pops outstanding change from outstandingChanges and adds txn to committedLog for follower to sync in addition to what processTxn(TxnHeader hdr, Record txn) does. The Learner uses processTxn(TxnHeader hdr, Record txn) to commit txn to memory after ZOOKEEPER-4394, which means it leaves committedLog untouched in SYNCHRONIZATION phase.

This way, a stale follower will have hole in its committedLog after joining cluster. The stale follower will propagate the in memory hole to other stale nodes after becoming leader. This causes data loss.

The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 is affected.

This commit drops processTxn(TxnHeader hdr, Record txn) as processTxn(Request request) is capable in SYNCHRONIZATION phase too.

Also, this commit rejects discontinuous proposals in syncWithLeader and committedLog, so to avoid possible data loss.

Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484

…mittedLog

There are two variants of `ZooKeeperServer::processTxn`. Those two
variants diverge significantly since ZOOKEEPER-3484.
`processTxn(Request request)` pops outstanding change from
`outstandingChanges` and adds txn to `committedLog` for follower to sync
in addition to what `processTxn(TxnHeader hdr, Record txn)` does. The
`Learner` uses `processTxn(TxnHeader hdr, Record txn)` to commit txn to
memory after ZOOKEEPER-4394, which means it leaves `committedLog`
untouched in `SYNCHRONIZATION` phase.

This way, a stale follower will have hole in its `committedLog` after
joining cluster. The stale follower will propagate the in memory hole
to other stale nodes after becoming leader. This causes data loss.

The test case fails on master and 3.9.3, and passes on 3.9.2. So only
3.9.3 is affected.

This commit drops `processTxn(TxnHeader hdr, Record txn)` as
`processTxn(Request request)` is capable in `SYNCHRONIZATION` phase too.

Also, this commit rejects discontinuous proposals in `syncWithLeader`
and `committedLog`, so to avoid possible data loss.

Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484
assertNotNull(zk.exists(path, false), path + " not found");
}

for (int i = 0; i < 10; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding the JIRA ticket to help people understand the issue and fix? Basically all the /boo related txns are synced via DIFF and added to the committed logs with the fix.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fzk.appendRequest(pif.hdr, pif.rec, pif.digest);
fzk.processTxn(pif.hdr, pif.rec);
fzk.appendRequest(pif.toRequest());
fzk.processTxn(pif.toRequest());
Copy link
Contributor

@li4wang li4wang Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick

How about create the request object and reuse it instead of converting twice?

Request request = pif.toRequest()
fzk.appendRequest(request)
fzk.processTxn(request)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

// log at warn should not be too verbose.
LOG.warn("DIFF sync got new epoch proposal 0x{}, last proposal 0x{}",
Long.toHexString(packet.hdr.getZxid()),
Long.toHexString(lastQueued));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same code as line 637-651. How about creating a common method to check if proposal is out of order so it can be used in both places?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move them to enforceContinuousProposal.

}
// We can't tell whether it is a data loss. Given that new epoch is rare,
// log at warn should not be too verbose.
LOG.warn("DIFF sync got new epoch proposal 0x{}, last proposal 0x{}",
Copy link
Contributor

@li4wang li4wang Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about keeping the original WARN statement, so it's very clear what is the actual and what is the expected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add expected zxid in newly introduced method enforceContinuousProposal.

ZxidUtils.zxidToString(request.zxid),
ZxidUtils.zxidToString(maxCommittedLog));
LOG.error(msg);
throw new IllegalStateException(msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test case for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is already covered by existing tests. allowDiscontinuousProposals(used by ZxidRolloverTest, TxnLogDigestTest, both introduce discontinuous proposals on purpose) was introduced to bypass this restriction.

Also this path will fail newly introduced test if we rollback changes involed in sync phase.

git checkout HEAD^ -- zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/
git checkout HEAD^ -- zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java

So, I think it is covered, though not for successful tests.

Besides, this is what I state in commit message.

Also, this commit rejects discontinuous proposals in syncWithLeader and committedLog, so to avoid possible data loss.

if (committedLog.isEmpty()) {
minCommittedLog = request.zxid;
maxCommittedLog = request.zxid;
} else if (request.zxid <= maxCommittedLog) {
// This could happen if lastProcessedZxid is rewinded and database is re-synced.
// Currently, it only happens in test codes, but it should also be safe for production path.
Copy link
Contributor

@li4wang li4wang Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a test case for the scenario?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LoadFromLogTest::testRestore fails without this. Here is the javadoc of the test.

    /**
     * Test we can restore the snapshot that has data ahead of the zxid
     * of the snapshot file.
     */

ZooKeeperServer zks = serverFactory.getZooKeeperServer();
long eZxid = zks.getZKDatabase().getDataTreeLastProcessedZxid();
// force the zxid to be behind the content
zks.getZKDatabase().setlastProcessedZxid(zks.getZKDatabase().getDataTreeLastProcessedZxid() - 10);
LOG.info("Set lastProcessedZxid to {}", zks.getZKDatabase().getDataTreeLastProcessedZxid());

@li4wang
Copy link
Contributor

li4wang commented Jun 2, 2025

The changes for fixing the data loss issue looks good. It's a very critical issue. Thanks for fixing it.

The changes for blocking proposal and commit requests being out of order looks good in general. We need to make sure it doesn't introduce any unexpected issue.

@kezhuw
Copy link
Member Author

kezhuw commented Jun 3, 2025

Hi @li4wang, thank for for reviewing this! I have pushed a new commit. Please take another to look and let me know whether it solves your concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants