[7.904][BUG][FIXED] Cluster config issue

Hi,

This seems to happen whenever I have to shut down both servers within the cluster. Both nodes cleanly shutdown via the shutdown command in the management/high availability menu.

Bought both nodes back online, however, the slave node comes back with a config error as below.

There is a process that runs at 1.30am BST that appears to clean up the databases and fixes the issue, can I execute this on an ad-hoc basis to resolve this?

2010:05:06-10:01:23 mercury-2 slon[14837]: [32-1] 2010-05-06 10:01:23 BSTFATAL main: Node is not initialized properly - sleep 10s
2010:05:06-10:01:24 mercury-2 slon[5491]: [902-1] 2010-05-06 10:01:24 BSTINFO remoteWorkerThread_1: syncing set 1 with 9 table(s) from provider 1
2010:05:06-10:01:24 mercury-2 slon[5491]: [903-1] 2010-05-06 10:01:24 BSTINFO remoteWorkerThread_1: SYNC 5000000404 done in 0.009 seconds
2010:05:06-10:01:28 mercury-1 slon[5264]: [832-1] 2010-05-06 10:01:28 BSTCONFIG version for "dbname=reporting host=X.X.X.X user=ha_sync password=XX" is 80403
2010:05:06-10:01:28 mercury-1 slon[5264]: [833-1] 2010-05-06 10:01:28 BSTERROR remoteListenThread_2: "select "_asg_cluster".registerNodeConnection(1); " - ERROR: schema
2010:05:06-10:01:28 mercury-1 slon[5264]: [833-2] "_asg_cluster" does not exist

Many thanks,

Darren

0 Astaro Beta Bot over 15 years ago

Astaro Beta Report

--------------------------------

Version: 7.904

Type: BUG

State: MERGED/FIXED

Reporter: darrenl+++

Contributor: 

MantisID: 13639

Target version: 7.911

Fixed in version: 7.911

--------------------------------

0 da_merlin over 15 years ago

Hi Darren,

the database cleanup at 1.30 is not related to the database synchronization.
Is this error permanent or is the database synchronization working after 15 minutes?

Cheers
Ulrich
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 darrenl over 15 years ago

Keeps on repeating the error in a loop, the status according to the menu is 'Syncing'.
I've seen this before (several times now) and it seems to clear during the database rebuilds/clean/sync performed in the early hours of the morning.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 darrenl over 15 years ago

Just got an automated email with the following (attachment was missing):

HA selfcheck: Please see attached logfile

--
HA Status          : CLUSTER MASTER (node id: 1)
System Uptime      : 0 days 3 hours 22 minutes
System Load        : 0.78
System Version     : Astaro Security Gateway Software 7.904

Please refer to the manual for detailed instructions.
HA SELFMON WARN: slon_control not running, restarting...

In HA log file:
2010:05:06-12:12:23 mercury-1 slon_control[3830]: Slonik error, process exited with value 255
2010:05:06-12:12:23 mercury-1 slon_control[3830]: Failed to drop slony schemas for reporting, process exited with value 1!
2010:05:06-12:12:23 mercury-2 ha_daemon[4372]: id="38A0" severity="info" sys="System" sub="ha" name="Initial synchronization finished!"
2010:05:06-12:12:24 mercury-1 ha_daemon[4171]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: SYNCING -> ACTIVE"
2010:05:06-12:12:26 mercury-1 slon_control[3890]: Slonik error, process exited with value 255
2010:05:06-12:12:26 mercury-1 slon_control[3890]: Failed to drop slony schemas for reporting, process exited with value 1!
2010:05:06-12:12:26 mercury-1 slon_control[3890]: Slonik error, process exited with value 255
2010:05:06-12:12:34 mercury-2 slon_control[24990]: Slonik error, process exited with value 255
2010:05:06-12:12:37 mercury-2 slon_control[25028]: Slonik error, process exited with value 255
2010:05:06-12:12:37 mercury-2 slon_control[25028]: Slonik error, process exited with value 255
2010:05:06-12:24:27 mercury-1 slon_control[11641]: Slonik error, process exited with value 255
2010:05:06-12:24:27 mercury-1 slon_control[11641]: Failed to drop slony schemas for reporting, process exited with value 1!
2010:05:06-12:24:27 mercury-1 slon_control[11641]: Slonik error, process exited with value 255
2010:05:06-12:24:31 mercury-1 slon_control[11671]: Slonik error, process exited with value 255
2010:05:06-12:24:31 mercury-1 slon_control[11671]: Failed to drop slony schemas for reporting, process exited with value 1!
2010:05:06-12:24:31 mercury-1 slon_control[11671]: Slonik error, process exited with value 255
2010:05:06-12:12:26 mercury-1 slon_control[3890]: Slonik error, process exited with value 255
2010:05:06-12:12:34 mercury-2 slon_control[24990]: Slonik error, process exited with value 255
2010:05:06-12:12:37 mercury-2 slon_control[25028]: Slonik error, process exited with value 255
2010:05:06-12:12:37 mercury-2 slon_control[25028]: Slonik error, process exited with value 255
2010:05:06-12:24:27 mercury-1 slon_control[11641]: Slonik error, process exited with value 255
2010:05:06-12:24:27 mercury-1 slon_control[11641]: Failed to drop slony schemas for reporting, process exited with value 1!
2010:05:06-12:24:27 mercury-1 slon_control[11641]: Slonik error, process exited with value 255
2010:05:06-12:24:31 mercury-1 slon_control[11671]: Slonik error, process exited with value 255
2010:05:06-12:24:31 mercury-1 slon_control[11671]: Failed to drop slony schemas for reporting, process exited with value 1!
2010:05:06-12:24:31 mercury-1 slon_control[11671]: Slonik error, process exited with value 255
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 da_merlin over 15 years ago

That was me, playing around on your system [:)]

Database synchronization should run fine now.
Found the error, postgreSQL 8.4 replaced integer reltriggers with bool relhastriggers,
which slon_control was not aware of...
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 darrenl over 15 years ago

aha! Many thanks Ulrich. [:D]

So is there something that is causing this synchronization issue when I reboot both servers (i.e. the defect is with PostgresSQL changing data)? It does seem that it's relatively easy to cause this.

Cheers,

Darren
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel