This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

QuickHA Active/Active broken

Hi all,

I tried to setup an active/active cluster using the new QuickHA feature. I put both devices in QuickHA mode and it started with a SSH handshake, so far so good. Unfortunately after waiting much longer than the mentioned four minutes nothing happened, so I stopped QuickHA discovery on one device wanting to try it via interactive mode. But on the other device, stopping QuickHA discovery just would not do anything. Clicking the button just won't do anything. Even after rebooting the device, it is still "stuck":

Only the applog.log show something: "Mar 10 16:46:44 ha: disablequickmodeha: dedicated link empty: quick mode not enabled?". Is the interface just broken or what is happening here? My first try was running on the first v18 build, but even after upgrading to build 339 it still looks like this.

Regards



This thread was automatically locked due to age.
Parents
  • FormerMember
    0 FormerMember

    Hi Julian Wagner,

    Could you please verify if you are trying to activate QuickHA mode as outlined in help section: 


    To use QuickHA, do the following.

    • Connect the XG Firewall devices using a network cable plugged into the dedicated HA port on both units.
    • Sign in to the web admin console of the primary XG Firewall and go to System services > High availability.
    • Select the Initial device role.
    • Ensure QuickHA is selected. You’ll see default settings (which you can change), as described in the steps that follow.
    • QuickHA generates a Passphrase automatically. You can also change the passphrase manually.

    Note: You can't enable HA if you turned on STP on a bridge interface.

    Note:The passphrase is used only once to generate the SSH keys used to encrypt communication over the HA link. It's then deleted.

    Quick HA selects a Dedicated HA link automatically. You can also select an interface manually.

    Note:By default, QuickHA selects the first unbound interface. If this is not available, it uses the first DMZ port. This interface will be renamed QuickHA Mode interface and assigned an IPv4 address from the link local range, 169.254.0.0/16.

     

    CAUTION
    If Quick HA selects a DMZ port that’s already in use, its current configuration will be overwritten.

    • Click Initiate HA.
    • Sign in to the web admin console of the auxiliary XG Firewall and go to System services > High availability.
    • Select Auxiliary as the device role.
    • Select QuickHA and enter the same Passphrase used on the primary XG Firewall.
    • Click Initiate HA. You see a message about the configuration being overwritten. This is because the configuration will be synchronized from the primary XG Firewall.

    Please check if you are following correct steps. 

    Thanks,

     

  • Hi H_Patel,

    all requirements in the following article were met: https://community.sophos.com/kb/en-us/123174

    I would refrain from using QuickHA for now anyway and configure it manually... But the problem is that I cannot stop the discovery process on one of the devices...

    Thanks and regards

  • Julian,

    from advanced shell, please check the HA logs:

    tail –f /log/msync.log | grep ha

    tail –f /log/applog.log | grep ha

    Thanks

  • Hi lferrara,

    as mentioned no output matches 'cat /log/<log>.log | grep "ha:"' for the msync.log and the only entry matching in the applog.log is when I am trying to deactivate the discovery via the GUI: "Mar 10 16:46:44 ha: disablequickmodeha: dedicated link empty: quick mode not enabled?".

    Regards

  • Uhm. I guess there is a new log file in v18.

    Check:

    ha_pair.log and ha_tunnel.log

    Still no HA units on v18. Sorry about that!

  • Hi Luk,

    good to know, but both logs I empty. Possibly because there was no HA established in the first place? From what I have gathered now, I would speculate that this is a bug in the GUI: The HA service is not really stuck in discovery mode, but it is displayed that way. When I try to disable the discovery it returns that no discovery running ("quick mode not enabled") and stays that way. I opened up a ticket with Sophos beginning of last week which seemed to be "stuck" as well as I got my first response today. Maybe they will be able to troubleshoot this issue.

    Regards

  • Hello,

    I came across your thread, as I am experiencing the same problem.

    For me trying to disable the QuickHA in WebAdmin just ends up spinning the loading circle for 2 seconds and then nothing happens.

    I am on v18 Build339. 

    How did it go with Sophos Support, did they managed to help with this issue? As I am going to file a support ticket now as well.

  • Hello,

    did you find a solution for your Problem?

    I have the same ...

  • Hello, 

    To share it with you and anyone else who is having this problem,

    I filed a ticket to Sophos Support, and an engineer was able to resolve this issue.

    It appears that when QuickHA was first activated, the information was not correctly registered in the Internal Database,

    therefore, it was not possible to disable QuickHA from the GUI, because there was no correct reference in this database of the port to disable it.

    This misconfiguration meant that even after restoring backups, the DB remained unchanged, which made the problem persistent regardless the backup restrored.

    The engineer manually registered the previous configuration in the DB and then only it was possible to succesfully disable QuickHA.

    I was advised to use Interactive mode instead, and now I am successfully running HA in Active-Passive mode.

    It appears that QuickHA still seems to be buggy enough to cause a system level corruption, which is pretty difficult to troubleshoot as all the HA related commands give no valuable info.

Reply
  • Hello, 

    To share it with you and anyone else who is having this problem,

    I filed a ticket to Sophos Support, and an engineer was able to resolve this issue.

    It appears that when QuickHA was first activated, the information was not correctly registered in the Internal Database,

    therefore, it was not possible to disable QuickHA from the GUI, because there was no correct reference in this database of the port to disable it.

    This misconfiguration meant that even after restoring backups, the DB remained unchanged, which made the problem persistent regardless the backup restrored.

    The engineer manually registered the previous configuration in the DB and then only it was possible to succesfully disable QuickHA.

    I was advised to use Interactive mode instead, and now I am successfully running HA in Active-Passive mode.

    It appears that QuickHA still seems to be buggy enough to cause a system level corruption, which is pretty difficult to troubleshoot as all the HA related commands give no valuable info.

Children
No Data