This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SFOS 17.1.3 MR-3 heartbeat broken

17.1.3 MR-3 broke my heartbeat

 

/log/heartbeatd.log

Tried this, with no luck:

community.sophos.com/.../127642



This thread was automatically locked due to age.
Parents
  • Can you take a look into the hb_trust.log and heartbeatd.log? 

    Also if you try to rejoin the HB with new registration, this does not work? 

    Can you post both logs? The current logs and the logs after rejoining? 

     

    *Edit*

    Updated 8 Appliance and all appliances worked fine after the update. 

    Is this an HA? 

  • I am back ;)

     

    Thanks for the prompt reponse!

    As I run HA A/P with 2xSG210, I did a failover, and the other device worked perfectly!

    Now I did a failover again and it's broken, look at the log files:

    hbtrust.log

    2018-09-26 02:09:58 INFO hbtrust[10525]:64 main:: - Locking HBtrust by setting LOCK_EX on /bin/hbtrust (prune)
    2018-09-26 02:09:58 INFO hbtrust[10525]:86 main:: - Executing: CERTREFRESH
    2018-09-26 02:09:58 FATAL Certificate.pm[10525]:109 SFOS::HBtrust::Central::Certificate::certificate_refresh - Seems that we got called by accident since we are not registred. Exiting.

    heartbeatd.log

    2018-09-26 20:57:58 INFO Main.cpp[14151]:125 initLogger - Heartbeat daemon build time: 12:00:11 Sep 19 2018
    2018-09-26 20:57:58 INFO Main.cpp[14151]:197 main - Heartbeat daemon starting
    2018-09-26 20:57:58 INFO Main.cpp[14151]:219 main - Maximum connected clients: 10000
    2018-09-26 20:57:58 INFO Main.cpp[14151]:132 needed_files_missing - blocking until missing files exist:
    2018-09-26 20:57:58 INFO Main.cpp[14151]:134 needed_files_missing - /conf/sysfiles/heartbeatd/ep_cert.crt
    2018-09-26 20:57:58 INFO Main.cpp[14151]:134 needed_files_missing - /conf/sysfiles/heartbeatd/server.crt
    2018-09-26 20:57:58 INFO Main.cpp[14151]:134 needed_files_missing - /conf/sysfiles/heartbeatd/server.key
    2018-09-26 20:58:49 INFO Main.cpp[14151]:260 operator() - Got SIGNAL so daemon is going to stop

    Also the HB monitor in gui, looks the old way:

    When I do failover to the "working device" it looks like this:

    So definately HA is not good with 17.1.3 MR-3, I must agree that this is the first time I update a live HA environment, before the update HB and failover worked flawlessly.

  • Maybe there was a mistake in the Update Process. 

    So lets do following: 

    Do a takeover to the working appliance.

    Clear the registration.  Wait couple of minutes.

    Check the License on mySophos, which appliances has the assets (subscription) attach to it. 

    Do a Takeover to this appliance, if this is not the current appliance as Master. 

    Register the HA again to Central. 

    Wait couple of minutes and perform another takeover to see, if it works again on the Auxiliary appliance. 

  • manbearpig said:

     

    Register the HA again to Central. 

     

     

    Just to make sure, you meant HB right? ;)

  • Well I assumed you meant HB, and I can confirm, it know WORKS on both appliances again :-D

    Thanks!

  • I think, there was a initial issue with the HA and the HB. So basically no direct relation to the Up2Date to MR3. Just a coincidence.

Reply Children
  • I think, there was a initial issue with the HA and the HB. So basically no direct relation to the Up2Date to MR3. Just a coincidence.

     

     

    I understand, I wish that HA in XG would be like ZEROCONF on UTM :-)

    The CyberRoam sync process is not my favorite, here are the message views from the two XG's, they are not the same either ;)

    Device 1:

    Device 2:

     

    But thanks for helping out! ;)