This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Errors in HA Live Log after 7.504 patch

We have a 3 node HA cluster that was running 7.502. On Tuesday I applied the 7.503 patch, and that went smooth. After all nodes were synced, I applied 7.504. The master and worker report active, but the slave has been in "syncing" phase for two days now. The HS Live Log reports the errors below:
--------------------------------------------------------------
2010:03:18-08:49:15 secgate-an-2 slon[13662]: [3-1] FATAL main: Node is not initialized properly - sleep 10s
2010:03:18-08:49:16 secgate-an-3 slon[7669]: [25070-1] ERROR slon_connectdb: PQconnectdb("dbname=pop3 host=198.19.250.2 user=ha_sync password=slony") failed - could not create
2010:03:18-08:49:16 secgate-an-3 slon[7669]: [25070-2] socket: Too many open files
2010:03:18-08:49:16 secgate-an-3 slon[7669]: [25071-1] WARN remoteListenThread_2: DB connection failed - sleep 10 seconds
2010:03:18-08:49:25 secgate-an-2 slon[11691]: [1-1] CONFIG main: slon version 1.2.20 starting up
2010:03:18-08:49:25 secgate-an-2 slon[13904]: [2-1] ERROR cannot get sl_local_node_id - ERROR: schema "_asg_cluster" does not exist
--------------------------------------------------------------
Any recommendations for troubleshooting?

This thread was automatically locked due to age.

0 scmiles over 16 years ago

Just an update, I rebooted the node that was stuck in syncing mode, and it came up and synced right away and the errors are gone.

We have a 3 node HA cluster that was running 7.502. On Tuesday I applied the 7.503 patch, and that went smooth. After all nodes were synced, I applied 7.504. The master and worker report active, but the slave has been in "syncing" phase for two days now. The HS Live Log reports the errors below:
--------------------------------------------------------------
2010:03:18-08:49:15 secgate-an-2 slon[13662]: [3-1] FATAL main: Node is not initialized properly - sleep 10s
2010:03:18-08:49:16 secgate-an-3 slon[7669]: [25070-1] ERROR slon_connectdb: PQconnectdb("dbname=pop3 host=198.19.250.2 user=ha_sync password=slony") failed - could not create
2010:03:18-08:49:16 secgate-an-3 slon[7669]: [25070-2] socket: Too many open files
2010:03:18-08:49:16 secgate-an-3 slon[7669]: [25071-1] WARN remoteListenThread_2: DB connection failed - sleep 10 seconds
2010:03:18-08:49:25 secgate-an-2 slon[11691]: [1-1] CONFIG main: slon version 1.2.20 starting up
2010:03:18-08:49:25 secgate-an-2 slon[13904]: [2-1] ERROR cannot get sl_local_node_id - ERROR: schema "_asg_cluster" does not exist
--------------------------------------------------------------
Any recommendations for troubleshooting?
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 16 years ago

Two days seems long enough if you don't have a substantial email quarantine and lots of logfile history. I would submit a trouble ticket and give it more time. Are you seeing in the Content Filter log that all three nodes are being used?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 scmiles over 16 years ago in reply to BAlfson

The errors cleared up after rebooting the slave, and all 3 nodes are synced and active now.

Yes all 3 nodes are being used in the Content Filter log.

The mail manager page takes awhile to load, close to a minute. Before the recent reboots, they would not load period and it has been that way for about a week. We seem to have trouble always loading the mail manager page.

We currently have close to 42,000 quarantined items right now.

Two days seems long enough if you don't have a substantial email quarantine and lots of logfile history. I would submit a trouble ticket and give it more time. Are you seeing in the Content Filter log that all three nodes are being used?

Cheers - Bob
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 16 years ago

42,000? That might take awhile. When you rebooted the Slave, wasn't the Worker promoted to Slave? Did you notice what the transfer throughput was on the HA link while that sync was taking place?

Have you roled out the Enduser Portal so users can manage their own quarantines and whitelists?

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 BrucekConvergent over 15 years ago in reply to scmiles

>>

CTO, Convergent Information Security Solutions, LLC

https://www.convergesecurity.com

Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries. Use the advice given at your own risk.
Cancel
Vote Up 0 Vote Down

Cancel
0 kbr over 15 years ago

Sorry Bruce, this was a configuration mistake at the betabot. I mistyped the number of the post where the betabot should put the Known-Issues List at. I hope i did'nt delete too much of your work?
Cancel
Vote Up 0 Vote Down

Cancel
0 BrucekConvergent over 15 years ago in reply to kbr

That explains it... for a minute I thought someone had nabbed my credentials.... no worries.

CTO, Convergent Information Security Solutions, LLC

https://www.convergesecurity.com

Advice given as posted on this forum does not construe a support relationship or other relationship with Convergent Information Security Solutions, LLC or its subsidiaries. Use the advice given at your own risk.
Cancel
Vote Up 0 Vote Down

Cancel