Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos XGS 116 loses WAN connection after a few hours of normal operation

In our production environment we have replaced our old firewall with a Sophos XGS 116. We have replicated the rules we had in our old firewall and everything seems to be going well: our applications are accessible, we can manage our databases... in short, the rules and communications work properly.

But the following happens to us: when a certain time elapses (2-3 hours), Sophos considers the WAN port "dead", the status marks it as red and sophos is unable to access the internet nor can we access Sophos from Sophos Central. We do manage to access it from one of our servers through private IPs.

Clarify that the rest of the things continue to work, the applications or the database or the NAS do not crash... it is only Sophos that is left without connection, therefore it cannot be monitored from outside or receive updates, nor can we access it from Sophos Central.

When we reboot Sophos, the WAN is back to green and accessible from Sophos Central (ie working fine). After a while (2-3 hours) it falls off again. We also found that you don't need to reboot it to get connectivity back. Simply by entering the port 1 (WAN) configuration and saving the changes (without making changes) it can be seen that it restarts the port and it works again for a few hours.

Today we have restarted it around 8:52 (7:52Z).

This is what it shows with tail -f /log/nSXLd.log

This is what tail -f /log/dgd.log shows

Yesterday we got it up at 19:42Z and it fell at 23:19Z.

(what is crossed out is the IP of the housing provider)

When down, the housing provider address cannot be pinged:

(what is crossed out is the IP of the housing provider)

When it's down, from the housing provider's point of view, this is what it sees (I replace data for security):

X.Y.Z.81 - [MAC Provider] Interface ARPA Bundle-Ether1.1108
X.Y.Z.82 00:00:02 0000.0000.0000 Incomplete ARPA Bundle-Ether1.1108
X.Y.Z.83 01:45:23 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108
X.Y.Z.84 01:52:30 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108
X.Y.Z.85 01:40:44 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108

When operational, what you see is this:

.Y.Z.81 - [MAC Provider] Interface ARPA Bundle-Ether1.1108
X.Y.Z.82 00:24:32 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108
X.Y.Z.83 00:29:14 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108
X.Y.Z.84 00:29:14 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108
X.Y.Z.85 00:29:14 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108

The failover rule we have set now is this:

But we have tried with ping to the IP of the provider, even with combined rules IP of the provider and 8.8.8.8, happening the same thing.

This is the configuration of port 1 - WAN (again, I censor information):

(when down, status (Estado) is red)

I have checked the forum looking for similar cases and it seems that there are (on other models). But we couldn't find a solution. Can you help us? Thanks in advance.



This thread was automatically locked due to age.
Parents
  • This sounds like a weird Interface issue. Take a look at the tcpdump. If you see packets send out on the interface, it is more likely an problem with the facility on the gateway / ISP end. So if you do a packet capture, and you see packets leaving, this sounds like the link is there, but nobody replies anymore. 

    By "saving the interface" the interface will reboot and also propagate its IP and MAC (ARP) once again. There could be a problem with ARP caches on the gateway. 

    __________________________________________________________________________________________________________________

  • I ran:

    tcpdump -ni any icmp 

    and while I ping from the Sophos diagnostic screen to the ISP's IP. All packages lost. And this was the result:

    13:33:19.618801 Port1, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 0, length 64
    13:33:19.618808 mv-pcimux0, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 0, length 64
    13:33:20.618897 Port1, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 1, length 64
    13:33:20.618903 mv-pcimux0, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 1, length 64
    13:33:21.619203 Port1, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 2, length 64
    13:33:21.619212 mv-pcimux0, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 2, length 64
    13:33:22.619313 Port1, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 3, length 64
    13:33:22.619323 mv-pcimux0, OUT: IP X.Y.Z.82 > X.Y.Z.81: ICMP echo request, id 7498, seq 3, length 64

  • Hi   Here during the issue time as per the above capture as the firewall is sending packets out to the intended interface Port1 for (Gateway IP) which indicates the firewall is having ARP or not losing ARP of GW IP  X.Y.Z.81

    Also, you may check it via the below command from XG shell access during an issue time (When the WAN connection loses).

    #arp - n | grep  X.Y.Z.81

    If ARP is there then we should get a reply from the next device, but that is not the case here as per the above capture. so probably the next device does not know where the X.Y.Z.82 is located or not having ARP for X.Y.Z.82 in its ARP table. 

    Have you checked on that part on the next router or device when the WAN connection loses, are you able to see X.Y.Z.82 ARP on that device? (If ARP is not there then if they have an option to add ARP manually over there then you may try with that time being as in workaround) till they investigate why that device loses ARP after some hours.

    If the next device already has ARP for X.Y.Z.82 then you may check and confirm on that device what is happening to these ICMP echo request packets which have been forwarded out via Port1.  (why those are not replied to or dropped etc on that device).

    OR below observation also may help for this situation (if possible then only as it may be required down hours to test this by removing the actual WAN ISP cable for a few hours)

    Another way to narrow down this complete situation is to connect one Laptop on Port1 by setting up the Laptop's IP  X.Y.Z.81 and GW IP of the Laptop by setting up  X.Y.Z.82 ( Firewall interface IP) - which will be a kind of P2P connectivity between Port1 and Laptop. Please keep continuous PING from Laptop and firewall and vice versa to each other's IP and ensure Laptop is not going into sleep mode and confirm WAN connection loses or not in this setup - if it is remaining up then something needs to be checked on the next device of ISP provider for WAN connection loses issue.

    Regards,

    Vishal Ranpariya
    Technical Account Manager | Sophos Technical Support

    Sophos Support Videos | Knowledge Base  |  @SophosSupport | Sign up for SMS Alerts |
    If a post solves your question use the 'This helped me' link.

  • With arp -n | grep X.Y.Z.81 we get the following:

    ? (X.Y.Z.81) at XX:YY:ZZ:WW:TT:CC [ether] on Port1

    On the next device, X.Y.Z.81, the ISP tells me that when it's down it doesn't see ARP (when I reboot it does).

    I have to say that in many years that we have been working with another firewall from another company, something similar has never happened to us. And only one device has been exchanged for another...

  • How long does your isp keep a connection active if the dhcp server does not see a renew request?

    ian

    XG115W - v19.5.1 mr-1 - Home

    If a post solves your question please use the 'Verify Answer' button.

  • My ISP configured the following yesterday:

    arp X.Y.Z.82 A.B.C arpA interface Bundle-Ether1.1108

    and not only was the WAN port up but it hasn't gone down again. It may be a workaround, but we understand this manual ARP configuration should not be necessary.

  • Could you give us more informaton of this ISP device? I saw similar problems with a local Austria ISP, which used a bridge and flushed the ARP Table way to fast. 

    __________________________________________________________________________________________________________________

Reply
  • Could you give us more informaton of this ISP device? I saw similar problems with a local Austria ISP, which used a bridge and flushed the ARP Table way to fast. 

    __________________________________________________________________________________________________________________

Children
  • Specifically, it is a data center. There is no dhcp, everything is configured with static ip.

    The strangest thing is that the hosts inside (connected to sophos) all work, but not Sophos itself (unless we restart it or restart the port).

    With the static ARP configuration on the datacenter device, it has not fallen again. But we insist that we shouldn't have to do that manual configuration, it should work without that setting.

  • This sounds like a Data Center device problem. Dropping an ARP Information, even if the device is sending data, sounds like a problem to me. 

    __________________________________________________________________________________________________________________