This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Failover WAN not failing over

Hey guys,

 

So I have 2 WAN links - one is Active and one is Backup - set to failover if the Primary fails by TCP not hitting 4.2.2.2 on Port 80 after 5 sec.

My Primary WAN port is Port 3 and shows Disconnected

My Back up Port is Port 8 and shows Connected

 

No matter what I try I cannot get the XG to failover to the backup connection.

Even swapping Active and Backup will not make them work.

 

In the FW rules I have  Primary and Secondary connection but they should fail over.

 

What am I missing?



This thread was automatically locked due to age.
Parents
  • Hi 

     

    Try to keep the WAN link load balance in the Firewall rule and see if that works. It's interesting that traffic does not pass through the Port8 when Port3 is down. Have you created any additional static routes or advanced firewall rules in the backend?

  • Jaydeep said:
    WAN link load balance in the Firewall rule and see if that works.

     

    Hey Jaydeep - yes we tried that as well. Sophos support even created a rule and tried all they could.

     

    I strongly believe its an issue with two PPPoE connections and how the XG handles a change such as failover. This XG used to have a Static and a PPPoE and it has failed over fine.

    A few updates and a change to WAN and no failover.

    Another Level 2 session booked for Monday night

  • Hi,

    the issue being that while WAN link manager causes fail overs, it does not cause a DHCP (PPPoE) refresh on the failed link and if the secondary link has failed at some stage, the failover process fails to connect.

    This should be relatively easy to prove. To cheap modem/routers that can handle PPPoE and have their NAT function disabled. Set one up on each link and monitor for a couple of days. Try causing link to fail eg disconnect the PPPoE interface cable, then reconnect it, then do the same with the other modem PPPoE interface cable.

    Ian

  • rfcat_vk said:
    the issue being that while WAN link manager causes fail overs, it does not cause a DHCP (PPPoE) refresh on the failed link and if the secondary link has failed at some stage, the failover process fails to connect.

     

    Might have something to do with it - however the XG does failover to the secondary link and its Active. my RED device reconnects to the failover WAN fine and from the XG you can ping / resolve the internet.

    But anything on the LAN side of the XG gets a Firewall VIOLATION error - Sophos Support couldnt work it out and we set up specific rules to force the traffic via the failover WAN but it just got blocked by the Firewall.

    Left the Support guy scratching his head.

  • Get them to do an edit and save of the offending interface (no changes).

    Ian

  • Level 2 Engineer jumped in last night and ran many logs and tests

    Admitted it has him stumpted also.....

     

    Its really looks like the NAT doesnt kick in bit the connection is up etc.

    Now we wait for them to check the logs

  • Another Hot date with a Sophos Engineer in a romantic Comms room environment tonight at 7.30pm :-)

     

    The Menu tonight is additional Log taking:

     

    "We need to check with generating the TCP  traffic at the time when failover happens instead of icmp 

    Also we may need to collect some additional debug logs in order to understand the issue in better manner"

     

    It is a very weird issue though. 

    I know it worked when I had Static / PPPoE - I wonder if now I have PPPoE / PPPoE it gets stuck / confused.

     

    Anyways the investigation continues.....

     

     

  • Keep up the good work, we might see a fix for this issue.

    This morning I took my XG offline while trying to update my UTM, the XG was offline for about 20 minutes and when reconnected work not pass traffic. Also was very slow to GUI login attempts. The WAN link showed failure even though I received messages indicating that the link was restored.

    I had to edit the WAN interface, save it before the XG requested a new IP address from the ISP.

    Ian

Reply
  • Keep up the good work, we might see a fix for this issue.

    This morning I took my XG offline while trying to update my UTM, the XG was offline for about 20 minutes and when reconnected work not pass traffic. Also was very slow to GUI login attempts. The WAN link showed failure even though I received messages indicating that the link was restored.

    I had to edit the WAN interface, save it before the XG requested a new IP address from the ISP.

    Ian

Children
  • I still think we have different issues.

     

    My Failover PPPoE is connected and incoming connectivity is available  i.e the RED reconnects to the backup fine.

     

    My issue is more from LAN - WAN - no traffic passes and I get a VIOLATION error. The FW rules that are required are correct but it wont pass traffic like the NAT is broken

  • That describes my issue exactly.

    Ian

  • Then as we wrap up tonight I will disable then reenable the backup pppoe and see if it connects

  • Just to add a little more fuel to the fire it is an IP4 issue, the IPv6 interface comes up correctly looking at the email messages.

    Ian

  • Another 3 hours in a Comms room failing over for sophos to take log files.

    No update as yet except its still broken.

     

    I did try editing the PPPoE interface but no change. Maybe it fixes yours as you are DHCP where I am Static.

  • Sad that doesn't work for your setup. It fixed mine when I had PPPoE, DHCP and static as well as DHCP FTTC.

    Also doesn't happen not e UTM, just XG.

    Ian

  • Hi,

    this morning being a little brave decided to see if I could emulate your issue, so I set my UTM to be a pretend ISP. he interfaces were all set to static IPs.

    All my tests passed.

    Tests were

    running speediest.net application

    1/. disable the interface on the UTM - successful failover

    2/. disable the alternate interface and enable the main interface - successful fail over

    3/. disable MASQ on UTM so link looks like it is up but fails wan link failover test - successful failover

    4/. disable alternate MASQ on UTM - successful failover

     

    Though one application speediest.net (application not web site) now refuses to connect regardless of interface being used.

    Ian

    Just reviewed the firewall rule for the speediest.net application to find out why it worked and found I had left the default WAN load balancing setting in place.

  • Hmmm I am not sure why mine doesn't failover.

     

    Has to be some combination of my two PPPoE or ISP upsetting the XG.

    Its from the LAN side being blocked so definitely the XG.....

     

  • Hi M8ey.

    did this ever work?

    Using 8.4.4.4 as a ping test has quite significant delay from my network over 100ms whereas 1.1.1.1 is under 10ms.

    I seem to remember sometime way back is XG first launch something about two PPPoE connections not working correctly, but cannot remember any specifics.

    Of course my testing was all done using ethernet connections not PPPoE.

    Ian

  • rfcat_vk said:
    did this ever work?

     

    Sure did - back when it worked my Primary was Static and the failover was PPPoE.