Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Probleme mit dem HA Cluster

Hallo,

seit ca. 3 Wochen haben wir Probleme mit dem HA Cluster. Über Nacht ist Node-1 auf UNLINKED gesprungen.

Was ich bis jetzt probiert habe:

- Node-1 neu gestartet
- Kabel zwischen beiden Nodes getauscht
- Operation mode auf Node-2 einmal auf Off gestellt und wieder aktiviert
- Node-1 komplett per USB Stick neu installiert

Als letzte Möglichkeit würde mir noch einfallen, Node-2 einmal neu zu installieren. Aber vielleicht hat einer eine Idee, bevor ich das mache bzw. den Sophos Support einschalte.

Beide haben die Version 9.705-3 und es handelt sich um eine SG 230.

Hier die Logs, während sich Node-1 nach der Neuinstallation wieder verbindet. Um 16:40 Uhr habe ich Node-1 einmal neu gestartet.

2021:03:04-14:48:41 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-14:48:41 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   60 41.621" name="Netlink: Lost link beat on eth3!"
2021:03:04-14:48:57 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   61 57.126" name="Netlink: Found link beat on eth3 again!"
2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:   62 00.207" name="Got misformed HA message type = 9 len = 64, msg.len = 28"
2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   63 00.207" name="Autojoin of 198.19.250.123 granted! Searching for unused node ID..."
2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   64 00.207" name="Found unused node id 1!"
2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   65 00.207" name="New node 1"
2021:03:04-14:49:00 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   66 00.609" name="Netlink: Lost link beat on eth3!"
2021:03:04-14:49:04 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   67 04.938" name="Netlink: Found link beat on eth3 again!"
2021:03:04-14:49:07 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   68 07.593" name="Netlink: Lost link beat on eth3!"
2021:03:04-14:49:07 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-14:49:10 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   69 10.407" name="Netlink: Found link beat on eth3 again!"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   70 14.677" name="Access granted to remote node 1!"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   71 14.744" name="Node 1 joined with version 9.705003"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38C0" severity="info" sys="System" sub="ha" seq="M:   72 14.744" name="Node 1 is alive"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   73 14.744" name="Node 1 changed state: DEAD(2048) -> SYNCING(2)"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   74 14.744" name="Node 1 changed role: UNKNOWN -> SLAVE"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   75 14.744" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1"
2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   76 14.856" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2021:03:04-14:49:14 fw01-2 ha_mode[21345]: calling topology_changed
2021:03:04-14:49:14 fw01-2 ha_mode[21345]: topology_changed: waiting for last ha_mode done
2021:03:04-14:49:15 fw01-2 ha_mode[21345]: repctl[21362]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-14:49:15 fw01-2 repctl[21362]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-14:49:15 fw01-2 repctl[21362]: [i] daemonize_check(1497): trying to signal daemon and exit
2021:03:04-14:49:15 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2021:03:04-14:49:15 fw01-2 ha_mode[21345]: topology_changed done (started at 14:49:14)
2021:03:04-14:49:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   77 15.551" name="Reading cluster configuration"
2021:03:04-14:49:20 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   78 20.062" name="Set syncing.files for node 1"
2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38C1" severity="error" sys="System" sub="ha" seq="M:   79 24.063" name="Node 1 is dead, received no heart beats"
2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   80 24.063" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   81 24.168" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2021:03:04-14:49:24 fw01-2 ha_mode[21661]: calling topology_changed
2021:03:04-14:49:24 fw01-2 ha_mode[21661]: topology_changed: waiting for last ha_mode done
2021:03:04-14:49:24 fw01-2 ha_mode[21661]: repctl[21677]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-14:49:24 fw01-2 repctl[21677]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-14:49:24 fw01-2 repctl[21677]: [i] daemonize_check(1497): trying to signal daemon and exit
2021:03:04-14:49:24 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2021:03:04-14:49:24 fw01-2 ha_mode[21661]: topology_changed done (started at 14:49:24)
2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   82 24.621" name="Reading cluster configuration"
2021:03:04-14:49:30 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   83 30.731" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0"
2021:03:04-14:49:50 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-14:49:50 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   84 50.811" name="Netlink: Lost link beat on eth3!"
2021:03:04-14:54:24 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-14:55:48 fw01-2 ha_daemon[11339]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:   85 48.567" name="filesync_end(): initial sync failed, status = 0x500"
2021:03:04-14:59:24 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:04:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:09:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:14:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:19:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:24:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:29:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:34:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:39:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   86 24.063" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
2021:03:04-15:39:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   87 24.179" name="Executing (nowait) /etc/init.d/ha_mode check"
2021:03:04-15:39:24 fw01-2 ha_mode[8628]: calling check
2021:03:04-15:39:24 fw01-2 ha_mode[8628]: check: waiting for last ha_mode done
2021:03:04-15:39:24 fw01-2 ha_mode[8628]: check_ha() role=MASTER, status=ACTIVE
2021:03:04-15:39:24 fw01-2 ha_mode[8628]: check done (started at 15:39:24)
2021:03:04-15:39:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:44:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:49:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:54:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-15:59:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:04:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:09:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:14:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:19:27 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:23:59 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   88 59.714" name="HA control: cmd = 'remove 1'"
2021:03:04-16:24:27 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:24:49 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   89 49.902" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:25:04 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   90 04.909" name="Netlink: Lost link beat on eth3!"
2021:03:04-16:25:04 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-16:25:19 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   91 19.335" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:   92 22.365" name="Got misformed HA message type = 9 len = 64, msg.len = 28"
2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   93 22.365" name="Autojoin of 198.19.250.115 granted! Searching for unused node ID..."
2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   94 22.365" name="Found unused node id 1!"
2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:   95 22.365" name="New node 1"
2021:03:04-16:25:23 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-16:25:23 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   96 23.165" name="Netlink: Lost link beat on eth3!"
2021:03:04-16:25:26 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   97 26.767" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:25:29 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   98 29.729" name="Netlink: Lost link beat on eth3!"
2021:03:04-16:25:29 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-16:25:32 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:   99 32.587" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  100 37.018" name="Access granted to remote node 1!"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  101 37.087" name="Node 1 joined with version 9.705003"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38C0" severity="info" sys="System" sub="ha" seq="M:  102 37.087" name="Node 1 is alive"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  103 37.087" name="Node 1 changed state: DEAD(2048) -> SYNCING(2)"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  104 37.087" name="Node 1 changed role: UNKNOWN -> SLAVE"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  105 37.088" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1"
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  106 37.203" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2021:03:04-16:25:37 fw01-2 ha_mode[19726]: calling topology_changed
2021:03:04-16:25:37 fw01-2 ha_mode[19726]: topology_changed: waiting for last ha_mode done
2021:03:04-16:25:37 fw01-2 ha_mode[19726]: repctl[19744]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-16:25:37 fw01-2 repctl[19744]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-16:25:37 fw01-2 repctl[19744]: [i] daemonize_check(1497): trying to signal daemon and exit
2021:03:04-16:25:37 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:25:37 fw01-2 ha_mode[19726]: topology_changed done (started at 16:25:37)
2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  107 37.762" name="Reading cluster configuration"
2021:03:04-16:25:43 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  108 43.062" name="Set syncing.files for node 1"
2021:03:04-16:25:46 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  109 46.096" name="Node 1 changed state: SYNCING(2) -> SYNCING(3)"
2021:03:04-16:25:52 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  110 52.944" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0"
2021:03:04-16:29:04 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   53 04.593" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0"
2021:03:04-16:30:37 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:32:54 fw01-1 repctl[6105]: [i] stop_backup_mode(765): stopped backup mode at 000000010000028500000098
2021:03:04-16:32:58 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to start....
2021:03:04-16:32:59 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:00 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:01 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:02 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:03 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:04 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:05 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:06 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:07 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:08 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:09 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:10 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:11 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:12 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:13 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:14 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:15 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:16 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:17 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:18 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:20 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:21 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:22 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:23 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:24 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:25 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:26 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:27 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:28 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:33:29 fw01-1 repctl[6105]: [i] execute(1768):  stopped waiting
2021:03:04-16:33:29 fw01-1 repctl[6105]: [i] execute(1768): pg_ctl: could not start server
2021:03:04-16:33:29 fw01-1 repctl[6105]: [i] execute(1768): Examine the log output.
2021:03:04-16:33:29 fw01-1 repctl[6105]: [e] prepare_secondary(352): prepare_secondary: Local db(pg_ctl) start failed
2021:03:04-16:33:36 fw01-1 repctl[6105]: [i] start_backup_mode(744): starting backup mode at 00000001000002850000009A
2021:03:04-16:33:36 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   54 36.853" name="HA control: cmd = 'sync start 2 database'"
2021:03:04-16:33:36 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   55 36.853" name="Activating sync process for database on node 2"
2021:03:04-16:35:36 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  111 36.435" name="Clear syncing.files for node 1"
2021:03:04-16:35:37 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:35:43 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  112 43.062" name="Set syncing.files for node 1"
2021:03:04-16:35:55 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  113 55.172" name="Clear syncing.files for node 1"
2021:03:04-16:37:29 fw01-1 repctl[6105]: [i] stop_backup_mode(765): stopped backup mode at 00000001000002850000009B
2021:03:04-16:37:31 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to start....
2021:03:04-16:37:32 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:33 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:34 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:35 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:36 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:37 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:38 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:39 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:40 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:41 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:42 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:43 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:44 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:45 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:46 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:47 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:48 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:49 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:50 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:51 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:37:53 fw01-1 repctl[6105]: [i] execute(1768):  done
2021:03:04-16:37:53 fw01-1 repctl[6105]: [i] execute(1768): server started
2021:03:04-16:37:53 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   56 53.407" name="HA control: cmd = 'sync stop 2 database'"
2021:03:04-16:37:53 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   57 53.407" name="Deactivating sync process for database on node 2"
2021:03:04-16:37:53 fw01-1 repctl[6105]: [i] setup_replication(278): checkinterval 300
2021:03:04-16:37:54 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   58 54.014" name="Initial synchronization finished!"
2021:03:04-16:37:54 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   59 54.014" name="state change SYNCING(3) -> UNLINKED(1)"
2021:03:04-16:37:54 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  114 54.808" name="Node 1 changed state: SYNCING(3) -> UNLINKED(1)"
2021:03:04-16:39:11 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S:   60 11.180" name="Netlink: Found link beat on eth2 again!"
2021:03:04-16:39:12 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S:   61 12.932" name="Netlink: Found link beat on eth1 again!"
2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S:   62 18.892" name="Netlink: Found link beat on eth0 again!"
2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S:   63 18.893" name="Netlink: Found link beat on eth0.99 again!"
2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S:   64 18.893" name="Netlink: Found link beat on eth0.100 again!"
2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S:   65 18.893" name="Netlink: Found link beat on eth0.101 again!"
2021:03:04-16:40:09 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  115 09.733" name="HA control: cmd = 'reboot 1'"
2021:03:04-16:40:09 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   66 09.733" name="Received command to reboot!"
2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   67 12.522" name="HA daemon shutting down (SIGTERM)"
2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   68 12.522" name="Executing (nowait) /etc/init.d/ha_mode disable"
2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   69 12.523" name="--- Node is disabled ---"
2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   70 12.523" name="Executing (nowait) /etc/init.d/ha_mode shutdown"
2021:03:04-16:40:12 fw01-1 ha_mode[19081]: calling shutdown
2021:03:04-16:40:12 fw01-1 ha_mode[19080]: calling disable
2021:03:04-16:40:12 fw01-1 ha_mode[19080]: disable: waiting for last ha_mode done
2021:03:04-16:40:12 fw01-1 ha_mode[19080]: Switching disable mode
2021:03:04-16:40:12 fw01-1 ha_mode[19080]: disable done (started at 16:40:12)
2021:03:04-16:40:12 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to shut down...
2021:03:04-16:40:12 fw01-1 repctl[6105]: [i] execute(1768): .
2021:03:04-16:40:12 fw01-1 ha_mode[19081]: shutdown: waiting for last ha_mode done
2021:03:04-16:40:12 fw01-1 ha_mode[19081]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped
2021:03:04-16:40:12 fw01-1 ha_mode[19081]: shutdown done (started at 16:40:12)
2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   71 12.639" name="HA daemon exits (SIGTERM)"
2021:03:04-16:40:13 fw01-1 repctl[6105]: [i] execute(1768):  done
2021:03:04-16:40:13 fw01-1 repctl[6105]: [i] execute(1768): server stopped
2021:03:04-16:40:13 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to start....
2021:03:04-16:40:14 fw01-1 repctl[6105]: [i] execute(1768):  done
2021:03:04-16:40:14 fw01-1 repctl[6105]: [i] execute(1768): server started
2021:03:04-16:40:14 fw01-1 repctl[6105]: [i] terminate(2321): exit due to signal TERM
2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38C1" severity="error" sys="System" sub="ha" seq="M:  116 15.062" name="Node 1 is dead, received no heart beats"
2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  117 15.063" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''"
2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  118 15.190" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2021:03:04-16:40:15 fw01-2 ha_mode[24979]: calling topology_changed
2021:03:04-16:40:15 fw01-2 ha_mode[24979]: topology_changed: waiting for last ha_mode done
2021:03:04-16:40:15 fw01-2 ha_mode[24979]: repctl[24994]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-16:40:15 fw01-2 repctl[24994]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-16:40:15 fw01-2 repctl[24994]: [i] daemonize_check(1497): trying to signal daemon and exit
2021:03:04-16:40:15 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:40:15 fw01-2 ha_mode[24979]: topology_changed done (started at 16:40:15)
2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  119 15.791" name="Reading cluster configuration"
2021:03:04-16:40:30 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  120 30.938" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0"
2021:03:04-16:40:52 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-16:40:52 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  121 52.713" name="Netlink: Lost link beat on eth3!"
2021:03:04-16:41:01 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  122 01.183" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:41:15 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-16:41:15 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  123 15.106" name="Netlink: Lost link beat on eth3!"
2021:03:04-16:41:30 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  124 30.075" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:41:32 fw01-2 conntrack-tools[13140]: no dedicated links available!
2021:03:04-16:41:32 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  125 32.742" name="Netlink: Lost link beat on eth3!"
2021:03:04-16:41:35 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  126 35.599" name="Netlink: Found link beat on eth3 again!"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  127 40.252" name="Access granted to remote node 1!"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  128 40.342" name="Node 1 changed version! 0.000000 -> 9.705003"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38C0" severity="info" sys="System" sub="ha" seq="M:  129 40.342" name="Node 1 is alive"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  130 40.342" name="Node 1 changed state: DEAD(2048) -> SYNCING(2)"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  131 40.342" name="Node 1 changed role: DEAD -> SLAVE"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  132 40.343" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1"
2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  133 40.475" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2021:03:04-16:41:40 fw01-2 ha_mode[25471]: calling topology_changed
2021:03:04-16:41:40 fw01-2 ha_mode[25471]: topology_changed: waiting for last ha_mode done
2021:03:04-16:41:40 fw01-2 ha_mode[25471]: repctl[25489]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-16:41:40 fw01-2 repctl[25489]: [i] daemonize_check(1480): daemonized, see syslog for further messages
2021:03:04-16:41:40 fw01-2 repctl[25489]: [i] daemonize_check(1497): trying to signal daemon and exit
2021:03:04-16:41:40 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:41:40 fw01-2 ha_mode[25471]: topology_changed done (started at 16:41:40)
2021:03:04-16:41:41 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  134 41.229" name="Reading cluster configuration"
2021:03:04-16:41:46 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  135 46.062" name="Set syncing.files for node 1"
2021:03:04-16:41:50 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  136 50.752" name="Clear syncing.files for node 1"
2021:03:04-16:41:53 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  137 53.355" name="Node 1 changed state: SYNCING(2) -> SYNCING(3)"
2021:03:04-16:41:56 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  138 56.390" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0"
2021:03:04-16:43:56 fw01-1 repctl[4486]: [i] stop_backup_mode(765): stopped backup mode at 00000001000002850000009E<30>Mar  4 16:44:00 repctl[4486]: [i] execute(1768): waiting for server to start....
2021:03:04-16:44:01 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:02 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:03 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:05 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:06 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:07 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:08 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:09 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:10 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:11 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:12 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:13 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:14 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:15 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:16 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:17 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:18 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:19 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768):  done
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): server started
2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   56 20.044" name="HA control: cmd = 'sync stop 2 database'"
2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   57 20.044" name="Deactivating sync process for database on node 2"
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 0
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): pg_ctl: server is running (PID: 10916)
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): /usr/pgsql92-64/bin/postgres "-D" "/var/storage/pgsql92/data"
2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   58 20.303" name="HA control: cmd = 'sync start 2 database'"
2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   59 20.303" name="Activating sync process for database on node 2"
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): waiting for server to shut down...
2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:44:21 fw01-1 repctl[4486]: [i] execute(1768):  done
2021:03:04-16:44:21 fw01-1 repctl[4486]: [i] execute(1768): server stopped
2021:03:04-16:44:22 fw01-1 repctl[4486]: [i] start_backup_mode(744): starting backup mode at 0000000100000285000000A0
2021:03:04-16:44:22 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   60 22.456" name="HA control: cmd = 'sync start 2 database'"
2021:03:04-16:44:22 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   61 22.456" name="Activating sync process for database on node 2"
2021:03:04-16:46:18 fw01-1 repctl[4486]: [i] stop_backup_mode(765): stopped backup mode at 0000000100000285000000A0
2021:03:04-16:46:21 fw01-1 repctl[4486]: [i] execute(1768): waiting for server to start....
2021:03:04-16:46:22 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:23 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:24 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:26 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:27 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:28 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:29 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:30 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:31 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:32 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:33 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:34 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:35 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:36 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:37 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:38 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:39 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:40 fw01-1 repctl[4486]: [i] execute(1768): .
2021:03:04-16:46:41 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1
2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] execute(1768):  done
2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] execute(1768): server started
2021:03:04-16:46:41 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   62 41.856" name="HA control: cmd = 'sync stop 2 database'"
2021:03:04-16:46:41 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   63 41.856" name="Deactivating sync process for database on node 2"
2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] setup_replication(278): checkinterval 300
2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] setup_replication(278): checkinterval 300
2021:03:04-16:46:42 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   64 42.244" name="Initial synchronization finished!"
2021:03:04-16:46:42 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S:   65 42.244" name="state change SYNCING(3) -> UNLINKED(1)"
2021:03:04-16:46:42 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  139 42.634" name="Node 1 changed state: SYNCING(3) -> UNLINKED(1)"

Gibt es noch andere Logs, die für die Analyse interessant wären?



This thread was automatically locked due to age.
Parents Reply Children
No Data