This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SFOS 18.0.3 MR-3 - QuickHA still have issues :-(

Have 1 x SG 210 Rev. 3 running

Added a new SG 210 rev. 3, registered it at the same Sophos Central account as number 1 but with another LAN IP on same subnet as no. 1

Setup Quick HA on both appliances, both come up "green" after a while on syncing back and forth.

Did after 12 hours a "Switch to passive device"

All traffic failed over, so far so good, but the webadmin on No. is extremely slow and I got this bug that has haunted me since SFOS 17.

I can see from the release notes that this heartbeat bug should have been fixed, but it's not, or am I doing anything wrong?

From device no 2, on no 1 it's activated:



This thread was automatically locked due to age.
Parents
  • There are several things to consider.

    First of all QuickHA Mode and Interactive Mode are only the "setup" option. In the End you get a HA, no matter which settings you used to build the HA. 

    Second point is, there is a Bug with Heartbeat, if you setup the HB first and then the HA. Would recommend to deregister the HA and wait 1-2 Minuten. Then register the Cluster. It should sync the configuration files. The issue with Stopped HB caused by missing files will be fixed in the near future but can be workarounded by simple re registering. 

    A HA Cluster in Central is (currently) in a basic HA support. Means Both appliances can be used, if activated. Therefore the second appliance is not activated per default. Thats why you see management deactivated on the second node. You can activate it, if you like. But still both appliances should be visible in Central. 

    The full HA Support is currently under development and should be come soon. 

  • Hi Lucar,

    1) I am aware of what I get, it's just not working right :-)

    2) have now tried this, but the issue remains on the aux. heartbeat is still stopped, here is from hbtrust log:

    2020-11-09 02:10:00 INFO hbtrust[28805]:71 main:: - Locking HBtrust by setting LOCK_EX on /bin/hbtrust (prune)
    2020-11-09 02:10:00 INFO hbtrust[28805]:98 main:: - Executing: CERTREFRESH
    2020-11-09 02:10:00 FATAL Certificate.pm[28805]:134 SFOS::HBtrust::Central::Certificate::certificate_refresh - Seems that we got called by accident since we are not registred. Exiting.

    heartbeatd.log is not even populated :-/

    So 18.5 is still where we want to go for Zeroconf HA?

    We have many customers with HA, today we are driving around to 5 of the for a "repair the aux XG that is dead"-tasks.

    The aux/mastewr not always come up after reboot or firmware update, so the HA will eventually "die"

  • This has nothing to do with QuickHA. 

    Simply re register the HA, should be enough. 

    Do you HAs die by the upgrade process from V17.5 to V18 or in V18? 

    This needs to be investigated and has, once again, nothing to do with QuickHA. 

  • 1) I know - just stating that is is not customer advisable atm.

    2) Now i did a register HA, disable HA on both, waited 5 mins, enabled again, waited 5 min, did a switchover - heartbeat sync still not working. Also did this at another customer, same issue..

    3) They can die out of the blue, in ex. when you reboot the master, ssen on 17 and 18

    4) In my appoinion it has, thus running without QuickHA og "HA" generally, the one stand alone appliance never dies...

Reply
  • 1) I know - just stating that is is not customer advisable atm.

    2) Now i did a register HA, disable HA on both, waited 5 mins, enabled again, waited 5 min, did a switchover - heartbeat sync still not working. Also did this at another customer, same issue..

    3) They can die out of the blue, in ex. when you reboot the master, ssen on 17 and 18

    4) In my appoinion it has, thus running without QuickHA og "HA" generally, the one stand alone appliance never dies...

Children
No Data