Description
As @dougfales put it:
We would prefer to manage the readable/writable state of mysql instances outside of the operator. On the rare occasion (for us) that the entire cluster must be set to RO, we will do this ourselves. In all other scenarios, we are used to trusting Orchestrator to do the right thing.
When we introduced the readOnly
in #100 we wanted to have a quarantee that when readOnly
is True
the cluster will not get writable again but in case of a failover the orchestrator will set new master writable without knowing about readOnly: True
. So in #222 we forced the orchestrator to do the failover but not apply promotion (which will not set node writable).
There are a few issues with this approach:
- The nodes are left in a bad configuration (detacted replication) resulting in the following error messages:
2020-10-15T21:40:51.873465Z 576 [ERROR] Slave I/O for channel '': error connecting to master 'sys_replication@//$release-mysqlcluster-db-mysql-0.mysql.$namespace:3306' - retry-time: 1 retries: 1755, Error_code: 2005
See the other replication issues related to this: #613, #608, #588, #565
- This process of setting cluster writable brokes Orchestrator failover process, see markReadOnlyNodesInOrc broke gracefulfailover process #566
- Poor documentation Q: orchestrator failover defaults #482
The fix that I propose here is to weaken the guarantees of readOnly
attribute (and document it) and let the Orchestrator handle read-only/ writable nodes by configuring it as follows:
ApplyMySQLPromotionAfterMasterFailover: true
MasterFailoverDetachReplicaMasterHost: false
For this we need to "revert" #222 and refactor #100 in a less invasive way. An implementation is already started in #522.