This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

HA and FailSafe

I am having a succession of issues with our Sophos firewall.

I converted from SG (SG210 in HA)a few weeks ago and it has been going well. I didn't do a migration, I built the new config from scratch. It was a HA under SG so I broke the HA and left the SG on with the cables unplugged in case I needed to fail back.

Today I noticed that I was unable to set any SSL VPN settings - it would say the settings were accepted but then if I went back to the screen, the old settings were back again. Also the system load was in a warning state all the time.

I figured it was about time to ditch the old SG completely and put HA under XG, so I built up the other SG appliance and made a HA cluster. I didn't want to reboot before I had HA in case something went wrong.

With the HA implemented, I rebooted the Primary and the whole network went down. The Primary got stuck on a blank LCD and the Auxiliary never took over. I have seen this a number of times recently - there seems to be some watchdog logic missing from the HA implementation in XG.

After removing power from the old Primary, the other node took over and the network came back up again (and I was now able to save SSL VPN settings)

The old node was now in failsafe mode, due to unable to load config. Good thing I put HA in before I rebooted!

I did a factory reset on the old node to bring it out of failsafe node and prepared to add it back as the Auxiliary (a pain because I will then have to do a node swap, break the HA, then rebuild so that the licensing all works properly)

But now on the Primary I can't access the HA screen - it just sits there spinning. And I don't want to reboot it in case it breaks too.

Any suggestions? I'm almost thinking I should restore a backup to the original Primary, factory reset the original Auxiliary, and the build it up that way. At least then there is only the amount of downtime that it takes for me to swap some cables over.

Thanks

James



This thread was automatically locked due to age.
  • Hi,

    i heard of an issue with the HA in which case the AUX appliance was to long idle and aux and the harddrive stucked. 

    Would suggest to open a support case to get the proper way to work with this.

    Maybe your aux is broken. 

  • I booted up with System Rescue and ran the SMART disk tests. A short test passed but the long test recorded a failure.

    The replacement unit was an SG210rev2 (failed unit was a rev1), so I then had to wait for a rev1, which finally arrived yesterday.

    It's now in and working, but HA failover still takes ages. 20 minutes since I rebooted the primary and the aux is still syncing. The LCD panel is blank, 'system ha show details' is hung, RED tunnels are down, and login fails ("Login Failed"). It all comes up eventually but should it really take this long?

    James