Sophos XGS 116 loses WAN connection after a few hours of normal operation

Question

In our production environment we have replaced our old firewall with a Sophos XGS 116. We have replicated the rules we had in our old firewall and everything seems to be going well: our applications are accessible, we can manage our databases... in short, the rules and communications work properly. 
 But the following happens to us: when a certain time elapses (2-3 hours), Sophos considers the WAN port "dead", the status marks it as red and sophos is unable to access the internet nor can we access Sophos from Sophos Central. We do manage to access it from one of our servers through private IPs. 
 Clarify that the rest of the things continue to work, the applications or the database or the NAS do not crash... it is only Sophos that is left without connection, therefore it cannot be monitored from outside or receive updates, nor can we access it from Sophos Central. 
 When we reboot Sophos, the WAN is back to green and accessible from Sophos Central (ie working fine). After a while (2-3 hours) it falls off again. We also found that you don't need to reboot it to get connectivity back. Simply by entering the port 1 (WAN) configuration and saving the changes (without making changes) it can be seen that it restarts the port and it works again for a few hours. 
 Today we have restarted it around 8:52 (7:52Z). 
 This is what it shows with tail -f /log/nSXLd.log 
 
 This is what tail -f /log/dgd.log shows 
 Yesterday we got it up at 19:42Z and it fell at 23:19Z. 
 
 (what is crossed out is the IP of the housing provider) 
 When down, the housing provider address cannot be pinged: 
 
 (what is crossed out is the IP of the housing provider) 
 When it's down, from the housing provider's point of view, this is what it sees (I replace data for security): 
 X.Y.Z.81 - [MAC Provider] Interface ARPA Bundle-Ether1.1108 X.Y.Z.82 00:00:02 0000.0000.0000 Incomplete ARPA Bundle-Ether1.1108 X.Y.Z.83 01:45:23 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 X.Y.Z.84 01:52:30 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 X.Y.Z.85 01:40:44 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 
 When operational, what you see is this: 
 .Y.Z.81 - [MAC Provider] Interface ARPA Bundle-Ether1.1108 X.Y.Z.82 00:24:32 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 X.Y.Z.83 00:29:14 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 X.Y.Z.84 00:29:14 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 X.Y.Z.85 00:29:14 [MAC Sophos] Dynamic ARPA Bundle-Ether1.1108 
 The failover rule we have set now is this: 
 
 But we have tried with ping to the IP of the provider, even with combined rules IP of the provider and 8.8.8.8, happening the same thing. 
 This is the configuration of port 1 - WAN (again, I censor information):

(when down, status (Estado) is red) 
 
 I have checked the forum looking for similar cases and it seems that there are (on other models). But we couldn't find a solution. Can you help us? Thanks in advance.

Vishal_R · Answer

Hi David FM Here during the issue time as per the above capture as the firewall is sending packets out to the intended interface Port1 for (Gateway IP) which indicates the firewall is having ARP or not losing ARP of GW IP  X.Y.Z.81

Also, you may check it via the below command from XG shell access during an issue time (When the WAN connection loses).

#arp - n | grep  X.Y.Z.81

If ARP is there then we should get a reply from the next device, but that is not the case here as per the above capture. so probably the next device does not know where the X.Y.Z.82 is located or not having ARP for X.Y.Z.82 in its ARP table.

Have you checked on that part on the next router or device when the WAN connection loses, are you able to see X.Y.Z.82 ARP on that device? (If ARP is not there then if they have an option to add ARP manually over there then you may try with that time being as in workaround) till they investigate why that device loses ARP after some hours.

If the next device already has ARP for X.Y.Z.82 then you may check and confirm on that device what is happening to these ICMP echo request packets which have been forwarded out via Port1. (why those are not replied to or dropped etc on that device).

OR below observation also may help for this situation (if possible then only as it may be required down hours to test this by removing the actual WAN ISP cable for a few hours)

Another way to narrow down this complete situation is to connect one Laptop on Port1 by setting up the Laptop's IP  X.Y.Z.81 and GW IP of the Laptop by setting up  X.Y.Z.82 ( Firewall interface IP) - which will be a kind of P2P connectivity between Port1 and Laptop. Please keep continuous PING from Laptop and firewall and vice versa to each other's IP and ensure Laptop is not going into sleep mode and confirm WAN connection loses or not in this setup - if it is remaining up then something needs to be checked on the next device of ISP provider for WAN connection loses issue.

Sophos XGS 116 loses WAN connection after a few hours of normal operation

Top Replies