[7.912][BUG][FIXED] Up2date process creates ungraceful handover for active/active cluster

Hi,

Just applied 7.912 to the cluster, this bug may have been present for a while but I just noticed it.

Started the up2date process, it applies the update to Node 2 (slave) first with Node 1 remaining as Master.
Update is fully applied to Node 2 and the node is rebooted.
Node 2 starts up, however, before it has finished coming online, the update process takes Node 1 down and tries to promote Node 2 to Master:

2010:05:20-17:36:15 mercury ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="--- Node is disabled ---"
2010:05:20-17:36:16 mercury ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Node 1 changed state: ACTIVE -> UP2DATE"
2010:05:20-17:36:16 mercury ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Node 1 changed mode: MASTER -> SLAVE"
2010:05:20-17:36:16 mercury ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Taking over after up2date!"
2010:05:20-17:36:16 mercury ha_daemon[4552]: id="38B0" severity="info" sys="System" sub="ha" name="Switching to Master mode"
2010:05:20-17:36:17 mercury slon_control[4737]: Dropping privileges to user postgres with user ID 999 and group ID 999
2010:05:20-17:36:17 mercury slon_control[4742]: Starting ASG slon control on Node 2
2010:05:20-17:36:13 mercury-1 ha_daemon[4160]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 2!"
2010:05:20-17:36:16 mercury-1 ha_daemon[4160]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed version! 7.911 -> 7.912"
2010:05:20-17:36:16 mercury-1 ha_daemon[4160]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: UP2DATE -> ACTIVE"
2010:05:20-17:36:16 mercury-1 ha_daemon[4160]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 upgraded to version 7.912 successfully"
2010:05:20-17:36:16 mercury-1 ha_daemon[4160]: id="38Ba" severity="info" sys="System" sub="ha" name="Cluster up2date successful, initiating graceful takeover"

All connectivity lost until Node 2 really is ready (ungraceful handover).

Node 1 then receives the update, reboots and isn't promoted until it has finished syncing and is ready to be the Master (graceful handover).

2010:05:20-17:44:52 mercury-2 ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Deactivating sync process for database on node 1"
2010:05:20-17:44:55 mercury-2 ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth1 eth2 "
2010:05:20-17:45:08 mercury-1 ha_daemon[4167]: id="38A0" severity="info" sys="System" sub="ha" name="Initial synchronization finished!"
2010:05:20-17:45:09 mercury-2 ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Node 1 changed state: SYNCING -> ACTIVE"
2010:05:20-17:45:09 mercury-2 ha_daemon[4552]: id="38C2" severity="info" sys="System" sub="ha" name="Preempt Slave 0, initiating graceful takeover!"
2010:05:20-17:45:09 mercury-2 ha_daemon[4552]: id="38B1" severity="info" sys="System" sub="ha" name="Switching to Slave mode"
2010:05:20-17:45:10 mercury-2 ha_daemon[4552]: id="38A0" severity="info" sys="System" sub="ha" name="Activating sync process for database on node 1"

Cheers,

Darren

Parents

0 kbr over 15 years ago

Just applied 7.912 to the cluster, this bug may have been present for a while but I just noticed it.

Yes, that bug is know since last year. I re-used that Mantis ID and added your information to it.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 kbr over 15 years ago

Just applied 7.912 to the cluster, this bug may have been present for a while but I just noticed it.

Yes, that bug is know since last year. I re-used that Mantis ID and added your information to it.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

No Data