Hi All,
Following on from the database sync issue I rebuilt both halves of the active/active cluster and it's been running ok for the last few days. This afternoon the master node (1) decided to burp and a failover to the slave node (3) occurred, with it being promoted to master. The failover was not at all a clean/seamless experience and connectivity to the internet was lost for about 3-4 minutes.
Once the original master node (1) came back online, it synced and then various routing issues started to occur within the network. The only way I could get back to a stable environment was to shut down the slave node (3).
I came across this in the HA log:
2010:02:18-14:30:36 servername-3 slon[5525]: [109-2] "_asg_cluster" does not exist
2010:02:18-14:30:37 servername-1 slon[26849]: [1-1] CONFIG main: slon version 1.2.15 starting up
2010:02:18-14:30:37 servername-1 slon[30034]: [2-1] ERROR cannot get sl_local_node_id - ERROR: schema "_asg_cluster" does not exist
2010:02:18-14:30:37 servername-1 slon[30034]: [3-1] FATAL main: Node is not initialized properly - sleep 10s
2010:02:18-14:30:46 servername-3 slon[5525]: [110-1] ERROR remoteListenThread_1: "select "_asg_cluster".registerNodeConnection(3); listen "_asg_cluster_Event"; " - ERROR: schema
2010:02:18-14:30:46 servername-3 slon[5525]: [110-2] "_asg_cluster" does not exist
2010:02:18-14:30:47 servername-1 slon[26849]: [1-1] CONFIG main: slon version 1.2.15 starting up
2010:02:18-14:30:47 servername-1 slon[30038]: [2-1] ERROR cannot get sl_local_node_id - ERROR: schema "_asg_cluster" does not exist
2010:02:18-14:30:47 servername-1 slon[30038]: [3-1] FATAL main: Node is not initialized properly - sleep 10s
This repeats every 10 seconds or so.
Any ideas? Clustering seems to be the bane of my existence at the moment.
Many thanks,
Darren
This thread was automatically locked due to age.