Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

upgraded to V18.0.4 MR-4 broke HA

I have 3 HA pairs of sophos XG Firewalls. I ran into an issue upgrading my last set tonight. it prompted me that it would reboot both devices simultaneously, coming from 17.5.12. so it rebooted and upgraded what was the secondary device and it became primary in the HA pair, which was normal from previous FW firmware upgrades. however the original primary FW now show fault for its HA status

what is strange is that if I console into the working FW of the pair and ping the HA pair IP of the secondary FW it responds like its up but i cannot ping the lan side IP of the secondary FW either. I will be looking at it on site first thing in the morning to see if its bricked or not

any suggestions any or things to look for?

thanks in advance



This thread was automatically locked due to age.
  • won't help you, but may help others. don't supposed you happened to follow https://support.sophos.com/support/s/article/KB-000039405?language=en_US ? i've been assured that if we follow these steps it should prevent failures of upgrades on HA clusters.

  • thanks,

    I walked in this morning and the one FW has a blank display, the lan and wan lights are off, HA was off ,when I disconnected the HA cable to FW responded and looked like it was rebooting so I let it sit and see what would happen it booted up and said it was still on FW 17.5.12 MR-12 and it immediately started to try and take over as the primary firewall, which i didn't want it to be so i had  to reconnect the HA cable for it to leave the FW on version V18.0.4 mr-4 the prirmary FW

    what is the suggested way to upgrade the FW on 17.5.12 to v18.0.4 mr-4 and have it sync up with the current primary firewall in this scenario?

    thanks

  • think it is just to follow that guide which talks through a few checks to make sure the health of the HA is optimum prior to upgrade

  • FormerMember
    0 FormerMember in reply to IT American Rock Salt

    Hi, For this scenario, You can break the HA and upgrade both appliances separately to 18.0.4 MR-4 and rebuild the HA.

    Make sure to take the appliance backup before upgrading.

    Hope this helps

  • sounds like it was when we upgraded 17.5 MR12 to 18 MR1

    at that time I added a comment on this thread:

    https://community.sophos.com/xg-firewall/f/discussions/122635/xg210---upgrade-from-17-5-12-mr-12-to-18-0-1-mr-1-build396---major-problems

    So still not fixed an only a "rare" condition?

  • thats pretty much exactly what happened and as was posted the aux FW if you reboot it never upgraded to 18.0.4 MR-4 so it doesn't know it should now be the aux firewall now and attemps to take control as the primary, as DeveshM said we have to get a window to break the HA, upgrade the FW that didn't successfully upgrade and rebuild the HA. 

  • We did exact the same upgrade path. 


    As we had issues with smaller issues so we were onsite during this update which is a quite big one as there are changes in the internal structure. The update worked well and took about 60-75 minutes on XG 550 Active-Passive Cluster. Including some prerequisites e.g. booting both firewalls one after the other before the update.  During the process there was a situation where both firewalls booted. 

    Do you have a cable for the HA between the two firewalls? If not there might be some timing issues regarding the availability of the network (e.g. spanning tree) which might lead to some split brain situation and let the firewall think that the partner is not available. You should be able to identify this in the logs of the firewall and the switch.

    The policies were split up in policies, nat rules and sd-wan rules as documented. In  the long term might need some consolidation. This is documented. Look into the knowledge base or watch the videos. But this part went well.

    However we had some issues with ping and teams configuration. The latter we have also seen before the update and we hoped to get rid of it through the update. The ping issue was new. If we disabled fastpath or run tcpdump the ping issue was going away. This is under observation through sophos.

    I guess updating the second firewall manually and reestablishing HA should not be a big issue.