This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Gateway 1 - Main ISP keeps getting disconnected, unable to find logs

Hi All,

I would like to seek your assistance regarding our issue.

We have DUAL WAN setup ISP#1 and ISP#2.

Since Thursday, ISP#1 keeps getting disconnected every several hours and it will stay disconnected for 15mins to 20mins.

The problem here is that even though we have a Failover, we still have several Webservers that are not setup for Failover.

So during the ISP#1 downtime, we lost connectivity to several webservers.

I have confirmed that during the 15-20mins downtime, the Internet is still flowing in the Mikrotik Router (ISP#1 router).

I am 100% sure that there is internet in the Mikrotik Router in the duration of the downtime in Sophos.

As I cannot find any meaningful errors in the Sophos logs, I replaced the Ethernet cables from ONT to Mikrotik Router, from Mikrotik to Sophos, and from Sophos to Switch.

I also upgraded to the latest SFOS 17.5.12 MR-12.HF052220.1 yesterday. 

However, this morning, the issue started happening again.

Is there a way to find any meaningful error besides the dgd.log?

I called Sophos Support and spent 2 days chasing Support and a total of 3 hours call but Escalation Engineer told me its an ISP Issue.

I am 100% confirmed that it is NOT an ISP issue as everything is working well in the Mikrotik Router every time ISP#1 went down in Sophos.

If you could just point me to the right direction or logs as to determine what could be causing the problem, I will very much appreciate it.

Thank you.

 


This thread was automatically locked due to age.
  •  

    I meant, CPU IDLE 95%, so only 5% is currently used over this weekend. 

    I generated a 48-hour report. So even with only 5% use on the CPU, I still experienced several disconnects.

    Without proper logs in Sophos Firewall, I am troubleshooting based on isolation only.

    I can confirm that when Sophos is showing ISP1 as disconnected, I can still ping the Sophos Public IP from outside network.

    I have now used a different PORT interface in the Sophos Firewall for ISP1 and recreated all Firewall Policies pointing to the new port interface for ISP1.

    However, I still experience the disconnect.

  • Hi,

    the issue will be with the ping and XG not seeing the ping response and as a result it thinks the link is down when in reality there is nothing wrong with the link.

    I haven't configured a fail over even though it is almost mandatory and you have to convince it not to activate on failure.

    Ian

  •  - you are right. the issue is now with Ping and XG, as the link is actually NOT down. I really appreciate you staying with me on this one.

  • It is going to be something stupid like a watch timer failing to handle two different requests for the service correctly and the supervisor timer cutting in after a delay.

    But that will be up to the Devs to identify and fix, by the way this is not the first time this issue has been raised.

    ian

  •  

    It is going to be another long night for me. I deleted the original Port interface of ISP1 and it is now on a completely different port interface.

    All Firewall rules related to ISP1 has been deleted and recreated. 

    I am now on a waiting game, the last disconnect is at 2:00pm today so I will wait until 2:00am.

    Hopefully, everything goes well and no more disconnect. Though if it happens again, I will completely remove the WAN Failover to see what will happen next.

    Thanks

  • OKay, so despite all the changes, I still got the ISP1 disconnect after 8 hours.

    I have now removed DUAL WAN Failover and will have to wait what happens next.

  •  

    So here is the kicker, after I removed DUAL WAN failover last night at 10:30pm, just 2 hours after that, I started receiving the Email Notice

    "User '-' failed to login from 'X.X.X.X' using ssh because of wrong credentials" (X.X.X.X - Public IP)

    The Email Notice is only 2-5minutes apart so I have gotten lots of emails since last night.

    The thing is, SSH over WAN and Ping over WAN is disabled or uncheck under Device Access.

    Removing the Dual WAN Failover setup has triggered attacks on our Firewall.

  • Hi,

    sorry, that doesn't make sense. I disabled those notifications because like you I have disabled those external features.

    More than likely you now have stable external DNS registration which is being used by the attackers. 

    I put a rule in place at the top to drop all connections to drop all external connections to my XG. How to create the rule can be found in this thread.

    https://community.sophos.com/products/xg-firewall/f/firewall-and-policies/118893/geoip

    It has stopped a lot of junk appearing in the log viewer.

    Ian

  •  - You are right, it does not make sense at all. So I did the trick you mentioned about saving the config without changing anything on the ISP1.

    I did the same on the Device access and rechecked and uncheck the SSH box on LAN, WAN, and VPN, so it has been 30 minutes and the attack did not happen yet.

    So far, it has been exactly 12 hours since I deleted DUAL WAN failover and ISP1 has not disconnect yet.

    Something is going on with Sophos that I cannot find out.

  •  ,  

    So it has been 18 hours since I have deleted the ISP2 interface which disables the DUAL WAN FAilover.

    There has been no disconnect so far on ISP1. I am planning to add ISP2 interface back in the Sophos Firewall.

    My worry is that the ISP1 will be disconnected once I activate DUAL WAN failover.

    As it is clearly a problem on my Sophos device, is there anything else that I can look at in the XG to check why XG is failing over even though ISP1 line is active?