Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

upgraded to V18.0.4 MR-4 broke HA

I have 3 HA pairs of sophos XG Firewalls. I ran into an issue upgrading my last set tonight. it prompted me that it would reboot both devices simultaneously, coming from 17.5.12. so it rebooted and upgraded what was the secondary device and it became primary in the HA pair, which was normal from previous FW firmware upgrades. however the original primary FW now show fault for its HA status

what is strange is that if I console into the working FW of the pair and ping the HA pair IP of the secondary FW it responds like its up but i cannot ping the lan side IP of the secondary FW either. I will be looking at it on site first thing in the morning to see if its bricked or not

any suggestions any or things to look for?

thanks in advance



This thread was automatically locked due to age.
Parents
  • We did exact the same upgrade path. 


    As we had issues with smaller issues so we were onsite during this update which is a quite big one as there are changes in the internal structure. The update worked well and took about 60-75 minutes on XG 550 Active-Passive Cluster. Including some prerequisites e.g. booting both firewalls one after the other before the update.  During the process there was a situation where both firewalls booted. 

    Do you have a cable for the HA between the two firewalls? If not there might be some timing issues regarding the availability of the network (e.g. spanning tree) which might lead to some split brain situation and let the firewall think that the partner is not available. You should be able to identify this in the logs of the firewall and the switch.

    The policies were split up in policies, nat rules and sd-wan rules as documented. In  the long term might need some consolidation. This is documented. Look into the knowledge base or watch the videos. But this part went well.

    However we had some issues with ping and teams configuration. The latter we have also seen before the update and we hoped to get rid of it through the update. The ping issue was new. If we disabled fastpath or run tcpdump the ping issue was going away. This is under observation through sophos.

    I guess updating the second firewall manually and reestablishing HA should not be a big issue.

Reply
  • We did exact the same upgrade path. 


    As we had issues with smaller issues so we were onsite during this update which is a quite big one as there are changes in the internal structure. The update worked well and took about 60-75 minutes on XG 550 Active-Passive Cluster. Including some prerequisites e.g. booting both firewalls one after the other before the update.  During the process there was a situation where both firewalls booted. 

    Do you have a cable for the HA between the two firewalls? If not there might be some timing issues regarding the availability of the network (e.g. spanning tree) which might lead to some split brain situation and let the firewall think that the partner is not available. You should be able to identify this in the logs of the firewall and the switch.

    The policies were split up in policies, nat rules and sd-wan rules as documented. In  the long term might need some consolidation. This is documented. Look into the knowledge base or watch the videos. But this part went well.

    However we had some issues with ping and teams configuration. The latter we have also seen before the update and we hoped to get rid of it through the update. The ping issue was new. If we disabled fastpath or run tcpdump the ping issue was going away. This is under observation through sophos.

    I guess updating the second firewall manually and reestablishing HA should not be a big issue.

Children
No Data