Hallo,
seit ca. 3 Wochen haben wir Probleme mit dem HA Cluster. Über Nacht ist Node-1 auf UNLINKED gesprungen.
Was ich bis jetzt probiert habe:
- Node-1 neu gestartet
- Kabel zwischen beiden Nodes getauscht
- Operation mode auf Node-2 einmal auf Off gestellt und wieder aktiviert
- Node-1 komplett per USB Stick neu installiert
Als letzte Möglichkeit würde mir noch einfallen, Node-2 einmal neu zu installieren. Aber vielleicht hat einer eine Idee, bevor ich das mache bzw. den Sophos Support einschalte.
Beide haben die Version 9.705-3 und es handelt sich um eine SG 230.
Hier die Logs, während sich Node-1 nach der Neuinstallation wieder verbindet. Um 16:40 Uhr habe ich Node-1 einmal neu gestartet.
2021:03:04-14:48:41 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-14:48:41 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 60 41.621" name="Netlink: Lost link beat on eth3!" 2021:03:04-14:48:57 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 61 57.126" name="Netlink: Found link beat on eth3 again!" 2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 62 00.207" name="Got misformed HA message type = 9 len = 64, msg.len = 28" 2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 63 00.207" name="Autojoin of 198.19.250.123 granted! Searching for unused node ID..." 2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 64 00.207" name="Found unused node id 1!" 2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 65 00.207" name="New node 1" 2021:03:04-14:49:00 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-14:49:00 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 66 00.609" name="Netlink: Lost link beat on eth3!" 2021:03:04-14:49:04 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 67 04.938" name="Netlink: Found link beat on eth3 again!" 2021:03:04-14:49:07 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 68 07.593" name="Netlink: Lost link beat on eth3!" 2021:03:04-14:49:07 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-14:49:10 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 69 10.407" name="Netlink: Found link beat on eth3 again!" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 70 14.677" name="Access granted to remote node 1!" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 71 14.744" name="Node 1 joined with version 9.705003" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 72 14.744" name="Node 1 is alive" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 73 14.744" name="Node 1 changed state: DEAD(2048) -> SYNCING(2)" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 74 14.744" name="Node 1 changed role: UNKNOWN -> SLAVE" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 75 14.744" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1" 2021:03:04-14:49:14 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 76 14.856" name="Executing (nowait) /etc/init.d/ha_mode topology_changed" 2021:03:04-14:49:14 fw01-2 ha_mode[21345]: calling topology_changed 2021:03:04-14:49:14 fw01-2 ha_mode[21345]: topology_changed: waiting for last ha_mode done 2021:03:04-14:49:15 fw01-2 ha_mode[21345]: repctl[21362]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-14:49:15 fw01-2 repctl[21362]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-14:49:15 fw01-2 repctl[21362]: [i] daemonize_check(1497): trying to signal daemon and exit 2021:03:04-14:49:15 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1 2021:03:04-14:49:15 fw01-2 ha_mode[21345]: topology_changed done (started at 14:49:14) 2021:03:04-14:49:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 77 15.551" name="Reading cluster configuration" 2021:03:04-14:49:20 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 78 20.062" name="Set syncing.files for node 1" 2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38C1" severity="error" sys="System" sub="ha" seq="M: 79 24.063" name="Node 1 is dead, received no heart beats" 2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 80 24.063" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''" 2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 81 24.168" name="Executing (nowait) /etc/init.d/ha_mode topology_changed" 2021:03:04-14:49:24 fw01-2 ha_mode[21661]: calling topology_changed 2021:03:04-14:49:24 fw01-2 ha_mode[21661]: topology_changed: waiting for last ha_mode done 2021:03:04-14:49:24 fw01-2 ha_mode[21661]: repctl[21677]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-14:49:24 fw01-2 repctl[21677]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-14:49:24 fw01-2 repctl[21677]: [i] daemonize_check(1497): trying to signal daemon and exit 2021:03:04-14:49:24 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1 2021:03:04-14:49:24 fw01-2 ha_mode[21661]: topology_changed done (started at 14:49:24) 2021:03:04-14:49:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 82 24.621" name="Reading cluster configuration" 2021:03:04-14:49:30 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 83 30.731" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0" 2021:03:04-14:49:50 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-14:49:50 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 84 50.811" name="Netlink: Lost link beat on eth3!" 2021:03:04-14:54:24 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-14:55:48 fw01-2 ha_daemon[11339]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 85 48.567" name="filesync_end(): initial sync failed, status = 0x500" 2021:03:04-14:59:24 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:04:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:09:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:14:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:19:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:24:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:29:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:34:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:39:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 86 24.063" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''" 2021:03:04-15:39:24 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 87 24.179" name="Executing (nowait) /etc/init.d/ha_mode check" 2021:03:04-15:39:24 fw01-2 ha_mode[8628]: calling check 2021:03:04-15:39:24 fw01-2 ha_mode[8628]: check: waiting for last ha_mode done 2021:03:04-15:39:24 fw01-2 ha_mode[8628]: check_ha() role=MASTER, status=ACTIVE 2021:03:04-15:39:24 fw01-2 ha_mode[8628]: check done (started at 15:39:24) 2021:03:04-15:39:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:44:25 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:49:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:54:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-15:59:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:04:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:09:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:14:26 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:19:27 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:23:59 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 88 59.714" name="HA control: cmd = 'remove 1'" 2021:03:04-16:24:27 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:24:49 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 89 49.902" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:25:04 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 90 04.909" name="Netlink: Lost link beat on eth3!" 2021:03:04-16:25:04 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-16:25:19 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 91 19.335" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A1" severity="warn" sys="System" sub="ha" seq="M: 92 22.365" name="Got misformed HA message type = 9 len = 64, msg.len = 28" 2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 93 22.365" name="Autojoin of 198.19.250.115 granted! Searching for unused node ID..." 2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 94 22.365" name="Found unused node id 1!" 2021:03:04-16:25:22 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 95 22.365" name="New node 1" 2021:03:04-16:25:23 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-16:25:23 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 96 23.165" name="Netlink: Lost link beat on eth3!" 2021:03:04-16:25:26 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 97 26.767" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:25:29 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 98 29.729" name="Netlink: Lost link beat on eth3!" 2021:03:04-16:25:29 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-16:25:32 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 99 32.587" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 100 37.018" name="Access granted to remote node 1!" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 101 37.087" name="Node 1 joined with version 9.705003" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 102 37.087" name="Node 1 is alive" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 103 37.087" name="Node 1 changed state: DEAD(2048) -> SYNCING(2)" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 104 37.087" name="Node 1 changed role: UNKNOWN -> SLAVE" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 105 37.088" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1" 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 106 37.203" name="Executing (nowait) /etc/init.d/ha_mode topology_changed" 2021:03:04-16:25:37 fw01-2 ha_mode[19726]: calling topology_changed 2021:03:04-16:25:37 fw01-2 ha_mode[19726]: topology_changed: waiting for last ha_mode done 2021:03:04-16:25:37 fw01-2 ha_mode[19726]: repctl[19744]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-16:25:37 fw01-2 repctl[19744]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-16:25:37 fw01-2 repctl[19744]: [i] daemonize_check(1497): trying to signal daemon and exit 2021:03:04-16:25:37 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:25:37 fw01-2 ha_mode[19726]: topology_changed done (started at 16:25:37) 2021:03:04-16:25:37 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 107 37.762" name="Reading cluster configuration" 2021:03:04-16:25:43 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 108 43.062" name="Set syncing.files for node 1" 2021:03:04-16:25:46 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 109 46.096" name="Node 1 changed state: SYNCING(2) -> SYNCING(3)" 2021:03:04-16:25:52 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 110 52.944" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0" 2021:03:04-16:29:04 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 53 04.593" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0" 2021:03:04-16:30:37 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:32:54 fw01-1 repctl[6105]: [i] stop_backup_mode(765): stopped backup mode at 000000010000028500000098 2021:03:04-16:32:58 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to start.... 2021:03:04-16:32:59 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:00 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:01 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:02 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:03 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:04 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:05 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:06 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:07 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:08 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:09 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:10 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:11 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:12 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:13 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:14 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:15 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:16 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:17 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:18 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:20 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:21 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:22 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:23 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:24 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:25 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:26 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:27 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:28 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:33:29 fw01-1 repctl[6105]: [i] execute(1768): stopped waiting 2021:03:04-16:33:29 fw01-1 repctl[6105]: [i] execute(1768): pg_ctl: could not start server 2021:03:04-16:33:29 fw01-1 repctl[6105]: [i] execute(1768): Examine the log output. 2021:03:04-16:33:29 fw01-1 repctl[6105]: [e] prepare_secondary(352): prepare_secondary: Local db(pg_ctl) start failed 2021:03:04-16:33:36 fw01-1 repctl[6105]: [i] start_backup_mode(744): starting backup mode at 00000001000002850000009A 2021:03:04-16:33:36 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 54 36.853" name="HA control: cmd = 'sync start 2 database'" 2021:03:04-16:33:36 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 55 36.853" name="Activating sync process for database on node 2" 2021:03:04-16:35:36 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 111 36.435" name="Clear syncing.files for node 1" 2021:03:04-16:35:37 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:35:43 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 112 43.062" name="Set syncing.files for node 1" 2021:03:04-16:35:55 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 113 55.172" name="Clear syncing.files for node 1" 2021:03:04-16:37:29 fw01-1 repctl[6105]: [i] stop_backup_mode(765): stopped backup mode at 00000001000002850000009B 2021:03:04-16:37:31 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to start.... 2021:03:04-16:37:32 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:33 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:34 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:35 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:36 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:37 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:38 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:39 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:40 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:41 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:42 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:43 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:44 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:45 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:46 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:47 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:48 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:49 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:50 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:51 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:37:53 fw01-1 repctl[6105]: [i] execute(1768): done 2021:03:04-16:37:53 fw01-1 repctl[6105]: [i] execute(1768): server started 2021:03:04-16:37:53 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 56 53.407" name="HA control: cmd = 'sync stop 2 database'" 2021:03:04-16:37:53 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 57 53.407" name="Deactivating sync process for database on node 2" 2021:03:04-16:37:53 fw01-1 repctl[6105]: [i] setup_replication(278): checkinterval 300 2021:03:04-16:37:54 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 58 54.014" name="Initial synchronization finished!" 2021:03:04-16:37:54 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 59 54.014" name="state change SYNCING(3) -> UNLINKED(1)" 2021:03:04-16:37:54 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 114 54.808" name="Node 1 changed state: SYNCING(3) -> UNLINKED(1)" 2021:03:04-16:39:11 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 60 11.180" name="Netlink: Found link beat on eth2 again!" 2021:03:04-16:39:12 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 61 12.932" name="Netlink: Found link beat on eth1 again!" 2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 62 18.892" name="Netlink: Found link beat on eth0 again!" 2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 63 18.893" name="Netlink: Found link beat on eth0.99 again!" 2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 64 18.893" name="Netlink: Found link beat on eth0.100 again!" 2021:03:04-16:39:18 fw01-1 ha_daemon[5998]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 65 18.893" name="Netlink: Found link beat on eth0.101 again!" 2021:03:04-16:40:09 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 115 09.733" name="HA control: cmd = 'reboot 1'" 2021:03:04-16:40:09 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 66 09.733" name="Received command to reboot!" 2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 67 12.522" name="HA daemon shutting down (SIGTERM)" 2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 68 12.522" name="Executing (nowait) /etc/init.d/ha_mode disable" 2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 69 12.523" name="--- Node is disabled ---" 2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 70 12.523" name="Executing (nowait) /etc/init.d/ha_mode shutdown" 2021:03:04-16:40:12 fw01-1 ha_mode[19081]: calling shutdown 2021:03:04-16:40:12 fw01-1 ha_mode[19080]: calling disable 2021:03:04-16:40:12 fw01-1 ha_mode[19080]: disable: waiting for last ha_mode done 2021:03:04-16:40:12 fw01-1 ha_mode[19080]: Switching disable mode 2021:03:04-16:40:12 fw01-1 ha_mode[19080]: disable done (started at 16:40:12) 2021:03:04-16:40:12 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to shut down... 2021:03:04-16:40:12 fw01-1 repctl[6105]: [i] execute(1768): . 2021:03:04-16:40:12 fw01-1 ha_mode[19081]: shutdown: waiting for last ha_mode done 2021:03:04-16:40:12 fw01-1 ha_mode[19081]: /var/mdw/scripts/confd-sync: /usr/local/bin/confd-sync stopped 2021:03:04-16:40:12 fw01-1 ha_mode[19081]: shutdown done (started at 16:40:12) 2021:03:04-16:40:12 fw01-1 ha_daemon[5998]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 71 12.639" name="HA daemon exits (SIGTERM)" 2021:03:04-16:40:13 fw01-1 repctl[6105]: [i] execute(1768): done 2021:03:04-16:40:13 fw01-1 repctl[6105]: [i] execute(1768): server stopped 2021:03:04-16:40:13 fw01-1 repctl[6105]: [i] execute(1768): waiting for server to start.... 2021:03:04-16:40:14 fw01-1 repctl[6105]: [i] execute(1768): done 2021:03:04-16:40:14 fw01-1 repctl[6105]: [i] execute(1768): server started 2021:03:04-16:40:14 fw01-1 repctl[6105]: [i] terminate(2321): exit due to signal TERM 2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38C1" severity="error" sys="System" sub="ha" seq="M: 116 15.062" name="Node 1 is dead, received no heart beats" 2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 117 15.063" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip ''" 2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 118 15.190" name="Executing (nowait) /etc/init.d/ha_mode topology_changed" 2021:03:04-16:40:15 fw01-2 ha_mode[24979]: calling topology_changed 2021:03:04-16:40:15 fw01-2 ha_mode[24979]: topology_changed: waiting for last ha_mode done 2021:03:04-16:40:15 fw01-2 ha_mode[24979]: repctl[24994]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-16:40:15 fw01-2 repctl[24994]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-16:40:15 fw01-2 repctl[24994]: [i] daemonize_check(1497): trying to signal daemon and exit 2021:03:04-16:40:15 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:40:15 fw01-2 ha_mode[24979]: topology_changed done (started at 16:40:15) 2021:03:04-16:40:15 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 119 15.791" name="Reading cluster configuration" 2021:03:04-16:40:30 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 120 30.938" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0" 2021:03:04-16:40:52 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-16:40:52 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 121 52.713" name="Netlink: Lost link beat on eth3!" 2021:03:04-16:41:01 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 122 01.183" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:41:15 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-16:41:15 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 123 15.106" name="Netlink: Lost link beat on eth3!" 2021:03:04-16:41:30 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 124 30.075" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:41:32 fw01-2 conntrack-tools[13140]: no dedicated links available! 2021:03:04-16:41:32 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 125 32.742" name="Netlink: Lost link beat on eth3!" 2021:03:04-16:41:35 fw01-2 ha_daemon[11339]: id="38A3" severity="debug" sys="System" sub="ha" seq="M: 126 35.599" name="Netlink: Found link beat on eth3 again!" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 127 40.252" name="Access granted to remote node 1!" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 128 40.342" name="Node 1 changed version! 0.000000 -> 9.705003" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38C0" severity="info" sys="System" sub="ha" seq="M: 129 40.342" name="Node 1 is alive" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 130 40.342" name="Node 1 changed state: DEAD(2048) -> SYNCING(2)" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 131 40.342" name="Node 1 changed role: DEAD -> SLAVE" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 132 40.343" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.2 slave_ip 198.19.250.1" 2021:03:04-16:41:40 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 133 40.475" name="Executing (nowait) /etc/init.d/ha_mode topology_changed" 2021:03:04-16:41:40 fw01-2 ha_mode[25471]: calling topology_changed 2021:03:04-16:41:40 fw01-2 ha_mode[25471]: topology_changed: waiting for last ha_mode done 2021:03:04-16:41:40 fw01-2 ha_mode[25471]: repctl[25489]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-16:41:40 fw01-2 repctl[25489]: [i] daemonize_check(1480): daemonized, see syslog for further messages 2021:03:04-16:41:40 fw01-2 repctl[25489]: [i] daemonize_check(1497): trying to signal daemon and exit 2021:03:04-16:41:40 fw01-2 repctl[11559]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:41:40 fw01-2 ha_mode[25471]: topology_changed done (started at 16:41:40) 2021:03:04-16:41:41 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 134 41.229" name="Reading cluster configuration" 2021:03:04-16:41:46 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 135 46.062" name="Set syncing.files for node 1" 2021:03:04-16:41:50 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 136 50.752" name="Clear syncing.files for node 1" 2021:03:04-16:41:53 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 137 53.355" name="Node 1 changed state: SYNCING(2) -> SYNCING(3)" 2021:03:04-16:41:56 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 138 56.390" name="Monitoring interfaces for link beat: eth4 eth2 eth1 eth0" 2021:03:04-16:43:56 fw01-1 repctl[4486]: [i] stop_backup_mode(765): stopped backup mode at 00000001000002850000009E<30>Mar 4 16:44:00 repctl[4486]: [i] execute(1768): waiting for server to start.... 2021:03:04-16:44:01 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:02 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:03 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:05 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:06 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:07 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:08 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:09 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:10 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:11 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:12 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:13 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:14 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:15 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:16 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:17 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:18 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:19 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): done 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): server started 2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 56 20.044" name="HA control: cmd = 'sync stop 2 database'" 2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 57 20.044" name="Deactivating sync process for database on node 2" 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] recheck(1057): got HUP: replication recheck triggered Setup_replication_done = 0 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): pg_ctl: server is running (PID: 10916) 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): /usr/pgsql92-64/bin/postgres "-D" "/var/storage/pgsql92/data" 2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 58 20.303" name="HA control: cmd = 'sync start 2 database'" 2021:03:04-16:44:20 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 59 20.303" name="Activating sync process for database on node 2" 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): waiting for server to shut down... 2021:03:04-16:44:20 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:44:21 fw01-1 repctl[4486]: [i] execute(1768): done 2021:03:04-16:44:21 fw01-1 repctl[4486]: [i] execute(1768): server stopped 2021:03:04-16:44:22 fw01-1 repctl[4486]: [i] start_backup_mode(744): starting backup mode at 0000000100000285000000A0 2021:03:04-16:44:22 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 60 22.456" name="HA control: cmd = 'sync start 2 database'" 2021:03:04-16:44:22 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 61 22.456" name="Activating sync process for database on node 2" 2021:03:04-16:46:18 fw01-1 repctl[4486]: [i] stop_backup_mode(765): stopped backup mode at 0000000100000285000000A0 2021:03:04-16:46:21 fw01-1 repctl[4486]: [i] execute(1768): waiting for server to start.... 2021:03:04-16:46:22 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:23 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:24 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:26 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:27 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:28 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:29 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:30 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:31 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:32 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:33 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:34 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:35 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:36 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:37 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:38 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:39 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:40 fw01-1 repctl[4486]: [i] execute(1768): . 2021:03:04-16:46:41 fw01-2 repctl[11559]: [i] recheck(1057): got ALRM: replication recheck triggered Setup_replication_done = 1 2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] execute(1768): done 2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] execute(1768): server started 2021:03:04-16:46:41 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 62 41.856" name="HA control: cmd = 'sync stop 2 database'" 2021:03:04-16:46:41 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 63 41.856" name="Deactivating sync process for database on node 2" 2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] setup_replication(278): checkinterval 300 2021:03:04-16:46:41 fw01-1 repctl[4486]: [i] setup_replication(278): checkinterval 300 2021:03:04-16:46:42 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 64 42.244" name="Initial synchronization finished!" 2021:03:04-16:46:42 fw01-1 ha_daemon[4439]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 65 42.244" name="state change SYNCING(3) -> UNLINKED(1)" 2021:03:04-16:46:42 fw01-2 ha_daemon[11339]: id="38A0" severity="info" sys="System" sub="ha" seq="M: 139 42.634" name="Node 1 changed state: SYNCING(3) -> UNLINKED(1)"
Gibt es noch andere Logs, die für die Analyse interessant wären?
This thread was automatically locked due to age.