This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XG210 - Upgrade from 17.5.12 MR-12 to 18.0.1 MR-1-Build396 - major problems

Hi,

Attempted an upgrade as per subject for a client.

We experience the following issues:

1. Random issues with traffic flow. Some servers could ping across the router to the internet randomly.

-- This turned out to be fixed by disabling hardware acceleration.

 

2. IPSEC Site-to-site VPN with policy routing (to AWS)

-- No traffic would flow across the IPSEC link until i enabled NAT. Using conntrack to identify the sessions - it looked correct and matched. Firewall policy test evaluated as expected. I tested with ping from the firewall, and that got through to the remote network ok.

 

3. Remote users connecting in with L2TP VPN

-- On V17, this was rock solid. On V18 the VPNs were randomly dropping. There were LCP errors in the log as well.

 

4. Voice quality - Phones connect (routed) across the Sophos between two local LAN segments.

-- On V17, this was fine. On V18, it sounded like there was constant very low packetloss, or perhaps the occasional packet experiencing jitter. It sounded like there were weird compression artifacts or something.

 

We've given upSophos engineer rolled back to V17 and all these problems seemed to go away.

 

Googling around shows we're not the only ones experiencing these weird problems.

 

 

Is this typical for the 'upgrade pain' to V18?

Is V18 ready for production use?



This thread was automatically locked due to age.
  • Hi,

    you have advised latest release, but which version?
    ian

  • Hi,

     

    I just updated the post - SFOS 18.0.1 MR-1-Build396

     

    Cheers!

    Shaun

  • We went the same upgrade-path on XG430 last weekend and it all started with a HA cluster failure. The slave upgraded to 18 but master remained on 17.5. The master displayed the slave as faulty but the slave thought it was also the master.

    Because of new communication protocols between them, they could not communicate over HA link anymore, in fact they sent us thousands of mails for HAuser over HA link attemting to login incorrectly and that the source IP has been blocked for 5 minutes. GREAT work!

    In the end we deleted and re-imaged the original master to v18 and tried to restore the backup from v17.5. There we noticed it is required to re-register a machine after restore against Sophos again - something that is not possible, when you do not have WAN. Why you do not allow to rebuild a machine and join it to a cluster without need to register and have internet access? We have access WAN over a LAG and several VLANs and it would take lots of time to rebuild everything manually just to register.

    In the end it took us more than 3 hours to get the cluster nodes back together because of other weird things. Quick Pairing and interactive Pairing failed several times.

  • @LHezrog

     

    Wow! that sounds horrific. So other than your HA nightmare during the upgrade, did you experience any other weird issues on v18?

    Not having a compatible cluster protocol, or at least something that fails cleanly is a massive oversight.

    I also detest the idea of being forced to "register" a device just so you can get it configured. Disabling the device's functionality until you've registered it sounds like the vendor is pigheadedly trying to protect their revenue at the expense of the customer experience. Sure, validate the licence, but perhaps a 30 day grace period from when you bring it online initially so you can get your comms going? Perhaps just a really annoying nag message with a 30 second wait when you first login?

    In any case, it seems like migrating to v18 is problematic for many of us. As much as i like the Sophos XG, it'll be hard for me to recommend it to anyone whilst these problems and issues persist.

  • In the end we deleted and re-imaged the original master to v18 and tried to restore the backup from v17.5. There we noticed it is required to re-register a machine after restore against Sophos again - something that is not possible, when you do not have WAN. Why you do not allow to rebuild a machine and join it to a cluster without need to register and have internet access? We have access WAN over a LAG and several VLANs and it would take lots of time to rebuild everything manually just to register.

     

    Simple trick to do this: 

    Go to Master, create a firewall Rule of HA Link to "Allow Zone to WAN". Enable NAT (V17.5). 

    Go to Aux, create HA Link as WAN interface, default gateway is the new HA Link IP of Primary. 

    Activate the second node. 

    Revert this on both appliance and reconfigure the HA.

     

    This can be done remotely within 5 minutes. 

  • IPsec Tunnel to AWS: Nevertheless what you did in the past, would recommend to move to VTI (route based) in V18 to connect aws/azure. This should be better in any case. 

     

    Did you disable the DPI Engine? Did this solve anything? 

     

  • Seems to still be a bug. Tried upgrading 17.5.14 to 18 MR3. starting getting constant hauser can't login alerts. tried rebooting, but its not coming back online now.

  • Is'nt it unbelievable that Sophos still has a broken upgrade path in their major product line? They know, that v17 and v18 have incompatible HA communication and this is not fixable? Even worse, they actively block the old HA partner because they don't speak the same language. I wonder how many hours in crashed HA environments have been exhausted.

    You will have to do a manual upgrade of the v17 node and then do your best to get them together into HA. If you call support, you will have this situation for some days.

  • Actually this seems to be a rare issue. And most likely this can be fixed by rebuilding the HA.

    As we build a new HA Channel, it seems like it rare conditions, this issue occurs on some installation. 

    This is not a general issue at all.