I've noticed over the last two days a pair of ASG120's I manage have started flip-flopping. I'll get reports the slave has taken over and then 30min to an hour later the master will take over again. During this process I get several emails about the "HA ctsync daemon not running - restarted" and a couple of reboot notifications (one for each node). The system then stays quiet for about 24hrs and repeats.
Also, while the nodes are showing as sync'd & active I keep getting this error:
2010:10:30-09:29:22 dclfw1-2 slon[8481]: [1-1] CONFIG main: slon version 1.2.20 starting up
2010:10:30-09:29:22 dclfw1-2 slon[8801]: [2-1] ERROR cannot get sl_local_node_id - ERROR: schema "_asg_cluster" does not exist
2010:10:30-09:29:22 dclfw1-2 slon[8801]: [3-1] FATAL main: Node is not initialized properly - sleep 10s
The nodes were setup per instructions in the HA guide and with some help from threads on here. The only other thing I've noticed during these events is that the system load on the node that dies (per the email) is typically above 0.80.
Everything works frine from the user standpoint but getting emails about these nodes is a bit annoying. Any ideas?
This thread was automatically locked due to age.