Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA A/P 2nd Node shutdown all the time

Hi there,

i have an issue with a new setuped UTM 220 Cluster.

The first node runs very well, Interface eth0,1,3,5 are connected, to switch, separated by vlans.

Now i want to join the cluster with the second node - and it boots up, say up2date and the shutdown ...

The connections are corretly assembled. Only eth3(HA) is directly connected by cable, not over a switch.

eth0 = LAN, eth1=WAN, eth3=HA, eth5=WLAN

config for cluster:

Active-Passive
Autoconf of new nodes enabled


2013:11:26-17:49:17 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:20 firewall ha_daemon[15458]: id="38A0" severity="info" sys="System" sub="ha" name="Cold Rollback enabled!"
2013:11:26-17:50:20 firewall ha_daemon[15458]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"
2013:11:26-17:50:20 firewall ha_daemon[15458]: id="38A0" severity="info" sys="System" sub="ha" name="Set ASG version to 9.106017"
2013:11:26-17:50:20 firewall ha_daemon[15458]: id="38A0" severity="info" sys="System" sub="ha" name="Set ASG appliance to 220"
2013:11:26-17:50:20 firewall ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Starting ASG HA daemon v1.0.0 in universal time-sharing mode on interface eth3 with name ctp1"
2013:11:26-17:50:20 firewall ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="--- Node is disabled ---"
2013:11:26-17:50:20 firewall ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth5 "
2013:11:26-17:50:20 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:20 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:20 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:20 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:20 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:21 firewall conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:23 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:23 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:50:24 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Found no master node, taking over!"
2013:11:26-17:50:24 firewall-1 ha_daemon[15459]: id="38B0" severity="info" sys="System" sub="ha" name="Switching to Master mode"
2013:11:26-17:50:24 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Initdead time over"
2013:11:26-17:50:25 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth5 "
2013:11:26-17:50:25 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth0!"
2013:11:26-17:50:25 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth5 again!"
2013:11:26-17:50:25 firewall-1 repctl[15872]:  execute(2324): pg_ctl: server is running (PID: 3728)
2013:11:26-17:50:25 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth5!"
2013:11:26-17:50:25 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth0 again!"
2013:11:26-17:50:25 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth5 again!"
2013:11:26-17:50:25 firewall-1 repctl[15872]:  execute(2324): pg_ctl: server is running (PID: 3728)
2013:11:26-17:50:25 firewall-1 repctl[15872]:  start_hawatch(1691): forked repctl hawatch daemon, pid 15879
2013:11:26-17:50:25 firewall-1 conntrack-tools[7660]: committing all external caches
2013:11:26-17:50:25 firewall-1 conntrack-tools[7660]: Committed 0 new entries
2013:11:26-17:50:25 firewall-1 conntrack-tools[7660]: commit has taken 0.001357 seconds
2013:11:26-17:50:25 firewall-1 conntrack-tools[7660]: flushing caches
2013:11:26-17:50:25 firewall-1 conntrack-tools[7660]: resync with master conntrack table
2013:11:26-17:50:26 firewall-1 repctl[15872]:  setup_replication(233): checkinterval 300
2013:11:26-17:50:26 firewall-1 repctl[15909]:  daemonize_check(2008): trying to signal daemon
2013:11:26-17:50:27 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Cold Rollback disabled!"
2013:11:26-17:50:27 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"
2013:11:26-17:50:29 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"
2013:11:26-17:51:40 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth5 "
2013:11:26-17:53:13 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:53:14 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:53:14 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-17:53:16 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:53:35 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:53:35 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-17:53:38 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:54:21 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:54:21 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-17:54:23 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:54:26 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Autojoin of 198.19.250.156 granted! Seaching for unused node ID..."
2013:11:26-17:54:26 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Found unused node id 2!"
2013:11:26-17:54:26 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 joined cluster"
2013:11:26-17:54:27 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:54:27 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-17:54:30 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:54:34 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:54:34 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-17:54:37 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:54:39 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:54:39 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-17:54:42 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
2013:11:26-17:54:46 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 2!"
2013:11:26-17:54:48 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 joined with version 9.105009"
2013:11:26-17:54:48 firewall-1 ha_daemon[15459]: id="38C0" severity="info" sys="System" sub="ha" name="Node 2 is alive!"
2013:11:26-17:54:48 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: DEAD -> UP2DATE"
2013:11:26-17:54:57 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="HA daemon of node 2 is restarting, waiting 900 seconds before declaring node as dead"
2013:11:26-17:55:27 firewall-1 conntrack-tools[7660]: no dedicated links available!
2013:11:26-17:55:27 firewall-1 ha_daemon[15459]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
2013:11:26-18:09:56 firewall-1 ha_daemon[15459]: id="38C1" severity="info" sys="System" sub="ha" name="Node 2 is dead, received no heart beats!"
2013:11:26-18:09:57 firewall-1 repctl[18878]:  daemonize_check(2008): trying to signal daemon
2013:11:26-18:09:59 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Reading cluster configuration"
2013:11:26-18:10:43 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth5 "
2013:11:26-18:21:59 firewall-1 ha_daemon[15459]: id="38A0" severity="info" sys="System" sub="ha" name="Monitoring interfaces for link beat: eth0 eth5 "
[/code]

after that ... the Node is dead ...

I did a factory reset of the joining node and also killed all the config on the HA page. (switch off too)

I thought it was depending on cables or connections, but thats ok.

If you switch it on again ... it shuts down again [;)]


This thread was automatically locked due to age.
  • Shutdown the Slave.
    Disconnect Ethernet cables from the Slave.
    Power on the Slave.
    Do a Factory Reset on the Slave.
    Shutdown the Slave.
    Reconnect the Ethernet cables.
    On the Master, set the Internal interface as the backup interface for the heartbeat.
    Power the Slave on and don't touch anything for a few minutes .

    Any luck with that?

    Cheers - Bob
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hi,

    i tried that way, but the same result.

    What was the differece between the two nodes ... the firmware ...

    I setuped the second machine a single node, updated the firmware to the same version as the master, reboot and 

    ... it joins the cluster and keep up [[[;)]]]

    So the reason was the version missmatch, while the second node wants to update, it got killed and shutdown before ...

    Thanks [[[;)]]] i Hope that information helps anoyone else [[[;)]]]
  • Hello, I am experiencing a similar problem with a cluster of two SG450, where the second unit turn itself off after a minute or so, instead of assuming the expected HA SLAVE state.

    Firmware is aligned on both units at version 9.213-4. Second unit has been factory reset via LCD panel then configured for ZeroConf via console:

    [FONT="Courier New"]cc set ha itfhw REF_ItfEthEth3A3Intel

    cc set ha status zeroconf
    [/FONT]

    Here is the HA log from Master:

    2015:07:13-18:56:45 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
    
    2015:07:13-18:56:57 DBPROFW01-1 conntrack-tools[21368]: no dedicated links available!
    2015:07:13-18:56:57 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
    2015:07:13-19:00:19 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
    2015:07:13-19:00:22 DBPROFW01-1 ha_daemon[21336]: id="38A0" severity="info" sys="System" sub="ha" name="Autojoin of 198.19.250.156 granted! Seaching for unused node ID..."
    2015:07:13-19:00:22 DBPROFW01-1 ha_daemon[21336]: id="38A0" severity="info" sys="System" sub="ha" name="Found unused node id 2!"
    2015:07:13-19:00:23 DBPROFW01-1 conntrack-tools[21368]: no dedicated links available!
    2015:07:13-19:00:23 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
    2015:07:13-19:00:26 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
    2015:07:13-19:00:28 DBPROFW01-1 conntrack-tools[21368]: no dedicated links available!
    2015:07:13-19:00:28 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"
    2015:07:13-19:00:31 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Found link beat on eth3 again!"
    2015:07:13-19:00:36 DBPROFW01-1 ha_daemon[21336]: id="38A0" severity="info" sys="System" sub="ha" name="Access granted to remote node 2!"
    2015:07:13-19:00:41 DBPROFW01-1 ha_daemon[21336]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 joined with version 9.213004"
    2015:07:13-19:00:41 DBPROFW01-1 ha_daemon[21336]: id="38C0" severity="info" sys="System" sub="ha" name="Node 2 is alive!"
    2015:07:13-19:00:41 DBPROFW01-1 ha_daemon[21336]: id="38A0" severity="info" sys="System" sub="ha" name="Node 2 changed state: DEAD -> SYNCING"
    2015:07:13-19:00:41 DBPROFW01-1 repctl[8147]:  daemonize_check(1873): trying to signal daemon
    2015:07:13-17:02:54 DBPROFW01-2 repctl[7057]:  execute(2190): .
    2015:07:13-19:00:51 DBPROFW01-1 ha_daemon[21336]: id="38C1" severity="info" sys="System" sub="ha" name="Node 2 is dead, received no heart beats!"
    2015:07:13-19:00:51 DBPROFW01-1 repctl[8227]:  daemonize_check(1873): trying to signal daemon
    2015:07:13-17:02:55 DBPROFW01-2 repctl[7057]:  execute(2190): done
    2015:07:13-17:02:55 DBPROFW01-2 repctl[7057]:  execute(2190): waiting for server to start....
    2015:07:13-17:02:56 DBPROFW01-2 repctl[7057]:  execute(2190): done
    2015:07:13-17:02:57 DBPROFW01-2 repctl[7057]: [w] master_connection(2437): check_dbh: -1
    2015:07:13-17:03:00 DBPROFW01-2 repctl[7057]: [e] db_connect(2560): timeout while connecting to database
    2015:07:13-17:03:00 DBPROFW01-2 repctl[7057]: [e] master_connection(2467): (timeout)
    2015:07:13-19:01:18 DBPROFW01-1 conntrack-tools[21368]: no dedicated links available!
    2015:07:13-19:01:18 DBPROFW01-1 ha_daemon[21336]: id="38A3" severity="debug" sys="System" sub="ha" name="Netlink: Lost link beat on eth3!"[/CODE]

    Of course Eth3 is directly linked via cross cable and Eth0 is configured as backup interface.

    Thank you in advance for your help.
  • configured for ZeroConf via console

    You've outsmarted yourself. [;)]  Factory reset.  Don't touch the config on the Slave, even via command line.

    Cheers - Bob
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA