This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[BUG 9.207] HA Cluster kill with newer Node

We had a problem with our Cluster on version  9.205-12 ( could not be updated bec. the second node hangs on up2date cycle since weeks).

The support told us to reimage the second node ... we did ...

After disconnecting the nodes wires, we reimaged the box with the current ISO 9.207019

We cleared the clusterconfig from the old node and set the flag to "autoconfig for new nodes", then we shutdown the imaged node, reconnected the wires and switch it on.

Suddenly the firewall was not reachable anymore!


What happed?

- the node joined the cluster
- up2date started
- up2date finished (bec it have a newer version)
- Node starts takeover!

What was the Problem?

- sync was missing! 
- the working master node switched to slave
- the "new master" node without any config promotes itself as MASTER

- the requirements for a takeover should be "sync ready" not only up2date!



Review what happen here in the logfile


2014:10:29-07:48:18 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth1 eth2 eth4 eth5 "
2014:10:29-07:49:33 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth1 eth2 eth4 eth5 "
2014:10:29-11:00:59 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth1 eth2 eth4 eth5 "
2014:10:29-11:51:01 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Autojoin of 198.19.250.103 granted! Seaching for unused node ID..."
2014:10:29-11:51:01 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Found unused node id 2!"
2014:10:29-11:51:44 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 2!"
2014:10:29-11:51:44 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 2!"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 joined with version 9.207019"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38C0" severity="info" sys="System" sub="ha" name="Node 2 is alive!"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: DEAD -> UP2DATE"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 ignored until up2date starts"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38B9" severity="info" sys="System" sub="ha" name="HA up2date successful, initiating graceful takeover"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="--- Node is disabled ---"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="start/reset initial synchronization timer = 0"
2014:10:29-11:52:18 IronGate-1 ha_daemon[3967]: id="38B1" severity="info" sys="System" sub="ha" name="Switching to Slave mode"
2014:10:29-11:52:20 IronGate-1 conntrack-tools[4617]: flushing conntrack table in 60 secs
2014:10:29-11:52:20 IronGate-1 conntrack-tools[4617]: request resync
2014:10:29-11:52:22 IronGate-1 ha_proxy[8644]: Shutting down.
2014:10:29-11:52:22 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: UP2DATE -> ACTIVE"
2014:10:29-11:52:22 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed mode: SLAVE -> MASTER"
2014:10:29-11:52:22 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="start/reset initial synchronization timer = 0"
2014:10:29-11:52:22 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="cluster mode: set master id to 2"
2014:10:29-11:52:22 IronGate-1 ha_daemon[3967]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"


This thread was automatically locked due to age.