Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

New SG230 in Cluster - Slave always losing HA eth3 Link

Hello Community,

we bought two SG230 - then i configured a Cluster ( HA LINK eth3)

Now we always getting  this mail from the UTM:

The High Availability System is active and fully functional.

You are receiving this message due to one of the following cases:
* The HA System has been successfully initiated for the first time.
* All HA nodes have returned into Active state again
- after they had been Unlinked or
- after they had been Dead and a reboot was forced or
- after an Up2Date process has been successfully finished.

We had two ASG 8 ( HA CLuster) befor and we never get this error.

We tried new cable - new patchpanel. --> same error.

HA LOG:

2016:02:23-05:45:47 UTM9CL-1 ha_daemon[4648]: id="38A3" severity="debug" sys="System" sub="ha" seq="M:  537 47.901" name="Netlink: Lost link beat on eth3!"
2016:02:23-05:45:47 UTM9CL-1 conntrack-tools[5179]: no dedicated links available!
2016:02:23-05:45:49 UTM9CL-1 ha_daemon[4648]: id="38C1" severity="error" sys="System" sub="ha" seq="M:  538 49.509" name="Node 2 is dead, received no heart beats"
2016:02:23-05:45:49 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  539 49.509" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip ''"
2016:02:23-05:45:49 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  540 49.587" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2016:02:23-05:45:49 UTM9CL-1 ha_mode[10580]: calling topology_changed
2016:02:23-05:45:49 UTM9CL-1 ha_mode[10580]: topology_changed: waiting for last ha_mode done
2016:02:23-05:45:49 UTM9CL-1 ha_mode[10580]: daemonized...
2016:02:23-05:45:49 UTM9CL-1 repctl[10596]: [i] daemonize_check(1362): trying to signal daemon
2016:02:23-05:45:49 UTM9CL-1 ha_mode[10580]: topology_changed done (started at 05:45:49)
2016:02:23-05:45:49 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  541 49.845" name="Reading cluster configuration"
2016:02:23-05:45:50 UTM9CL-2 ha_daemon[4154]: id="38A3" severity="debug" sys="System" sub="ha" seq="S: 1019 50.838" name="Netlink: Found link beat on eth3 again!"
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  542 51.327" name="Node 2 changed version! 0.000000 -> 9.351003"
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38A1" severity="warn" sys="System" sub="ha" seq="M:  543 51.327" name="Lost heartbeat message from node 2! Expected 8522690 but got 8522694"
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38C0" severity="info" sys="System" sub="ha" seq="M:  544 51.327" name="Node 2 is alive"
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  545 51.327" name="Node 2 changed state: DEAD(2048) -> ACTIVE(0)"
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  546 51.327" name="Node 2 changed role: DEAD -> SLAVE"
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  547 51.327" name="Executing (wait) /usr/local/bin/confd-setha mode master master_ip 198.19.250.1 slave_ip 198.19.250.2"
2016:02:23-05:45:51 UTM9CL-2 ha_daemon[4154]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 1020 51.557" name="Node 1 changed version! 0.000000 -> 9.351003"
2016:02:23-05:45:51 UTM9CL-2 ha_daemon[4154]: id="38A1" severity="warn" sys="System" sub="ha" seq="S: 1021 51.557" name="Lost heartbeat message from node 1! Expected 8919099 but got 8919103"
2016:02:23-05:45:51 UTM9CL-2 ha_daemon[4154]: id="38C0" severity="info" sys="System" sub="ha" seq="S: 1022 51.557" name="Node 1 is alive"
2016:02:23-05:45:51 UTM9CL-2 ha_daemon[4154]: id="38A0" severity="info" sys="System" sub="ha" seq="S: 1023 51.557" name="Node 1 changed state: DEAD(2048) -> ACTIVE(0)"
2016:02:23-05:45:51 UTM9CL-2 ha_daemon[4154]: id="38A0" severity="info" sys="System" sub="ha" seq="S:    0 51.557" name="Node 1 changed role: DEAD -> MASTER"
2016:02:23-05:45:51 UTM9CL-2 ha_daemon[4154]: id="38A0" severity="info" sys="System" sub="ha" seq="S:    1 51.557" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2016:02:23-05:45:51 UTM9CL-2 ha_mode[26445]: calling topology_changed
2016:02:23-05:45:51 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  548 51.574" name="Executing (nowait) /etc/init.d/ha_mode topology_changed"
2016:02:23-05:45:51 UTM9CL-1 ha_mode[10725]: calling topology_changed
2016:02:23-05:45:51 UTM9CL-1 ha_mode[10725]: topology_changed: waiting for last ha_mode done
2016:02:23-05:45:51 UTM9CL-1 ha_mode[10725]: daemonized...
2016:02:23-05:45:51 UTM9CL-1 repctl[10751]: [i] daemonize_check(1362): trying to signal daemon
2016:02:23-05:45:51 UTM9CL-1 ha_mode[10725]: topology_changed done (started at 05:45:51)
2016:02:23-05:45:52 UTM9CL-1 ha_daemon[4648]: id="38A0" severity="info" sys="System" sub="ha" seq="M:  549 52.391" name="Reading cluster configuration"
2016:02:23-05:45:52 UTM9CL-2 ha_mode[26435]: daemonized...
2016:02:23-05:45:52 UTM9CL-2 repctl[26471]: [i] daemonize_check(1362): trying to signal daemon
2016:02:23-05:45:52 UTM9CL-2 ha_mode[26435]: topology_changed done (started at 05:45:50)
2016:02:23-05:45:52 UTM9CL-2 ha_mode[26445]: topology_changed: waiting for last ha_mode done
2016:02:23-05:45:52 UTM9CL-2 ha_mode[26445]: daemonized...
2016:02:23-05:45:52 UTM9CL-2 repctl[26506]: [i] daemonize_check(1362): trying to signal daemon
2016:02:23-05:45:52 UTM9CL-2 ha_mode[26445]: topology_changed done (started at 05:45:51)

any ideas ?

 



This thread was automatically locked due to age.
  • You might try the following command with 1500 and 2000 to make sure that the MTU's match:

    cc set ha advanced mtu 1500

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello Bob,

    schould i run this command on Shell ? ( is it possible to configure it on Webinterface ? )

    first with :

    cc set ha advanced mtu 1500


    when link lost continues:

    then: cc set ha advanced mtu 2000 ?
  • Yes, from the shell, Meinhart. You can run ifconfig eth3 first to see where the Master is now, so you can just try the other.

    Cheers - Bob
    PS Welcome to the UTM Community!
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello Bob,


    thank you for the answer.

    I connected via SSH to the IP-Adress from my UTM.

    an ifconfig eth3 shows that the MTU size of eht3 is 2000. --> question: is this value from the physival Interface of my Master ?


    if i now set the MTU size to 1500 - do i have to reboot the Master to Setup the same value on SLAVE ?

    or is the command "cc set ha advanced mtu 1500" for both appliances ?


    Sorry for my bad english, i didnt use G translator-- school english :)
  • I've seen a couple of situations where the Master and Slave had different MTU settings, causing what I remember is the problem you're describing.

    The commands only relate to the Master as there's a whole, separate process to login to the Slave. After logging into the Slave, the commands you enter apply only to the Slave. If you want to check the Slave, login as loginuser with ha_utils ssh, and then run the ifconfig command on the Slave.

    Were the MTUs different?

    Cheers - Bob
    PS Your English is fine, Meinhart.
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello Bob,

    thanks for your detailed answer. I checked both MTUs and they are both 2000.

    Is it useful to configure it to 1500 ?

    When its not the MTU Size , then what is the Problem ?
  • 2000 is the best, so the problem isn't the MTU.

    Do you have the Internal interface configured as the backup for the heartbeat? Have you tried changing the cables and switch ports that connect one eth3 to the other? If so, then it's time to get Sophos Support involved.

    Cheers - Bob
     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Hello Bob,

    i checked everything :

    cable
    inhouse patch field ( diffrent ports)

    same Problem..

    i also opend a case at Sophos: they written back:

    please check ur cabling , there is no Hardware defect.

    thats all...
  • They just emailed you without looking at your UTM?   If you have a paid subscription, I would insist that support look at your box.

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • okay.today i opend a new Ticket...

    Thanks Bob.. when you come to Germany -  near Frankfurt -- call me... i invite you for luch / dinner. :-)