This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Active sessions don't fail back when primary gateway restored

Hardware: XG-125

Firmware version: 17.06

----------------------------

Configuration:

- Gateway 1 - Active, weight 1, connected to a low latency terrestrial connection

- Gateway 2 - Backup, inherit weight from primary, connected to a satellite connection

- Firewall Rule - Primary gateway: Gateway 1 - Backup Gateway: Gateway 2

 

Behavior:

Failover -  Works beautifully. When Gateway 1 fails (pings fail) traffic flips over to Gateway 2. 

 

Failback - When Gateway 1 comes back up, network flows that either failed over to Gateway 2 - or ere already active on Gateway 2 when Gateway 1 came back up DOES NOT failback to Gateway 1.

This is problematic for us because long-lived high-bandwidth flows remain on the satellite network instead of flipping back to the low latency terrestrial connection. Suggestions on how to resolve this and cause flows to failback properly to the primary gateway?



This thread was automatically locked due to age.
Parents
  • Hi,

    I am not sure about your device configuration. I am sharing some basic idea about a configuration of the backup gateway.

     

    And you have to assign the more weight on the primary gateway (My personal experience) and adjust the Gateway Failover Timeout. 

     

    Regards,

    Deepak Kumar

  • This is exactly the configuration we currently have on the backup gateway.

  • Hi,

    If the Link will restore automatically then are getting the "ACTIVE"  status under the Control Center?

    Regards,

    Deepak Kumar

  • Yes.

    New flows go out via the correct gateway. Flows which failed over from Gateway 1 to Gateway 2 (or started on Gateway 2) do not failback to Gateway 1 when it comes back up. 

  • Any update on this?  We are having the same issue with our Sophos XG135 on SFOS 17.1.2 MR-2.  We can't get our IP phones to connect to the primary WAN after a failover.  Rebooting the phones does not solve the issue.  Disconnecting WAN2 or a firewall reboot seem to be the only options.

  • As far as i know, this is works as designed. 

    Basically XG is using and holding active sessions via Conntrack to one WAN Interface. If there would be a fallback to the failure WAN interface, it would cause a conntrack / stateful firewall missmatch and most of the services in the internet would go crazy. 

    You need to setup a new connection in order to get the Connections up and running. 

    There is something called tcp handshake. https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_establishment

    So if you perform the Handshake with WAN2 (as failover) and want to switch to WAN1 after failback, it would break the handshake because you are using another IP and not the original source ip of WAN2. So XG is holding the connection on WAN2 and build up all new connections with WAN1. 

  • Many other firewalls allow a forced break of all sessions, or selected sessions, networks, objects, etc when your failback to the primary connection.  We are seeing this issue with IP phones and wireless access points that connect to cloud servers.  Even rebooting the equipment does not start a new session on WAN1.  To make matters worse, there are no way to down the interface in the GUI.  I've got more failover options in a $200 Zywall. 

  • Per the latest update on my ticket with Sophos Support, this will be fixed in MR3. I've also seen an interface mockup that shows it as a selectable option to clear connection tracking on a switch back to the primary gateway.

  • What you describe is exactly the reason I opened up a ticket with customer support. Long-lived connections will remain on the backup gateway until the connection is broken and a new connection is established. This is especially problematic in cases where the backup connection(s) is/are metered or lower-bandwidth/higher-latency than your primary connection.

    I have other WAN gateway products in the network I manage that act like your other devices - after a set period of time, they break the connections by deleting the connection tracking entries to fail the traffic back to the most preferred gateway. 

Reply
  • What you describe is exactly the reason I opened up a ticket with customer support. Long-lived connections will remain on the backup gateway until the connection is broken and a new connection is established. This is especially problematic in cases where the backup connection(s) is/are metered or lower-bandwidth/higher-latency than your primary connection.

    I have other WAN gateway products in the network I manage that act like your other devices - after a set period of time, they break the connections by deleting the connection tracking entries to fail the traffic back to the most preferred gateway. 

Children