Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

2x SG430 - After starting Up2Date Process HA commincation broke down - Slave in unknown state

Hallo,

Wir haben am Freitag Mittag die letzten beiden Updates durchlaufen lassen wollen.

Dabei ist das Cluster leider gecrashed.

Seit dem gibt es auch keinerlei Einträge mehr im Hochverfügbarkeits LOG (HA-Log)

Die letzten Einträge sehen wie folgt aus:

 

2018:03:16-13:10:01 dialin-2 ha_daemon[4836]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  466 01.166" name="HA control: cmd = 'up2date 9.508010'"
2018:03:16-13:10:01 dialin-2 ha_daemon[4836]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  467 01.166" name="Initiating up2date on node 1 to version 9.508010"
2018:03:16-13:10:01 dialin-1 ha_daemon[4808]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  438 01.166" name="state change ACTIVE(0) -> UP2DATE(256)"
2018:03:16-13:10:01 dialin-1 ha_daemon[4808]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  439 01.166" name="Starting local up2date 9.506002 -> 9.508010"
2018:03:16-13:10:01 dialin-1 ha_daemon[4808]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  440 01.167" name="Executing (nowait) /etc/init.d/ha_mode disable"
2018:03:16-13:10:01 dialin-1 ha_daemon[4808]: id="38A0" severity="info" sys="System" sub="ha" seq="S:  441 01.167" name="--- Node is disabled ---"
2018:03:16-13:10:01 dialin-1 ha_mode[9587]: calling disable
2018:03:16-13:10:01 dialin-1 ha_mode[9587]: disable: waiting for last ha_mode done
2018:03:16-13:10:01 dialin-1 ha_mode[9587]: Switching disable mode
2018:03:16-13:10:01 dialin-1 ha_mode[9587]: disable done (started at 13:10:01)
2018:03:16-13:10:01 dialin-1 ha_up2date[9586]: already running(9592) (exit 3)
2018:03:16-13:10:01 dialin-1 repctl[19687]: [i] execute(1768): waiting for server to shut down...
2018:03:16-13:10:01 dialin-1 repctl[19687]: [i] execute(1768): .
2018:03:16-13:10:02 dialin-2 ha_daemon[4836]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  468 02.010" name="Node 1 changed state: ACTIVE(0) -> UP2DATE(256)"
2018:03:16-13:10:02 dialin-2 ha_daemon[4836]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  469 02.011" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2018:03:16-13:10:02 dialin-2 ha_mode[31156]: calling topology_changed
2018:03:16-13:10:02 dialin-2 ha_mode[31156]: topology_changed: waiting for last ha_mode done
2018:03:16-13:10:02 dialin-2 ha_mode[31156]: topology_changed done (started at 13:10:02)
2018:03:16-13:10:02 dialin-1 repctl[19687]: [i] execute(1768):  done
2018:03:16-13:10:02 dialin-1 repctl[19687]: [i] execute(1768): server stopped
2018:03:16-13:10:02 dialin-1 repctl[19687]: [i] execute(1768): waiting for server to start....
2018:03:16-13:10:03 dialin-2 repctl[8098]: [i] terminate(2321): exit due to signal TERM
2018:03:16-13:10:03 dialin-1 repctl[19687]: [i] execute(1768):  done
2018:03:16-13:10:03 dialin-1 repctl[19687]: [i] execute(1768): server started
2018:03:16-13:10:03 dialin-1 repctl[19687]: [i] terminate(2321): exit due to signal TERM
2018:03:16-14:30:10 dialin-2 ha_daemon[4836]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  470 10.609" name="Monitoring interfaces for link beat: eth10 eth11 eth6 eth0 eth13 eth16 eth5 eth4 eth2 eth14 eth3 eth15 eth7 eth12 eth1 eth17"

Was können wir hier tun?


This thread was automatically locked due to age.
  • Hallo Patrick,

    Try a hard reboot of the Slave (node 1).  If that doesn't work, try disabling HA and then re-enabling it.

    If it still doesn't work, I would be tempted to re-image it from ISO.  If you do that, remember that it has to be on the same version as the Master (node 2).  It's been about five years since this happened to one of my clients - I don't remember if we also had to disable/enable HA also.

    Any luck with any of those ideas?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hallo Bob!

     

    Vielen Dank für den Tipp!
    Wir haben die Node1 jetzt hard ausgeschaltet, 30 Sekunden stromlos gelassen und dann neu gestartet.
    Anschließend ist das Update nach erneutem Anstoßen durchgelaufen.

     

    [SOLVED]