All up2date packages have been downloaded, and the Master node updated to version 7.006 without issue in less than 5 minutes. The slave node then went into it's up2date process, seeing that the Master was a different version. The Slave has not recovered from the up2date state for well over 48 hours now.
We're running ASL Software in Active/Passive mode. Hardware is identical Silicon Mechanics servers, 1gig of RAM each, single AMD Opteron processor, 80G HDD and quad port Intel NICs.
We've restarted the Slave node 3 times during this issue and it does reboot normally:
2007:11:10-00:50:28 (none) ha_daemon[3181]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth1 again!"
2007:11:10-00:50:36 (none) ha_daemon[3181]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 2!"
2007:11:10-00:50:39 (none) ha_daemon[3181]: id="38C0" severity="info" sys="System" sub="ha" name="Node 2 is alive!"
2007:11:10-00:50:39 (none) ha_daemon[3181]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: DEAD -> UP2DATE"
2007:11:10-00:50:39 (none) ha_daemon[3181]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 joined with version 7.005"
2007:11:10-00:50:39 (none) ha_daemon[3181]: id="38A0" severity="info" sys="System" sub="ha" name="Waiting for up2date process on unconfigured node 2"
But, it goes straight back into it's up2date state and we do not know how to get this process finished. Has anyone experienced a similar issue? Is there any way of recovering this H/A cluster without breaking it apart and doing updates separately?
This thread was automatically locked due to age.