This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

IP becomes randomly unreachable

Appliance XG 135
SFOS 18.0.1 MR-1

Hello everybody,

since deploying the xg firewall appliance we encountered an issue with one of our external servers. All services hosted on this server become randomly unavailable because of a timeout while doing the ssl handshake. The client sends the ClientHello command and the server does not respond with a ServerHello command. During the period of not responding (about 5-10 minutes) the server is reachable through other networks which don't get routed through the firewall i.e. other DSL connections, mobile, etc.

Currently only the firewall and nat rules are setuped, no policies whatsoever are applied.

I also did a tcpdump and it shows the same as a manual get request over curl. The clients sends ClientHello and the server does not respond.

The only things that get logged are invalid packets and packets that don't correspond to an active connection. As you can see in the attached screenshot.

Do you have any idea what the issue could be?

This thread was automatically locked due to age.

Parents

0 LHerzog over 5 years ago

I would check routing here. Seems to me like not all packets are seen by your XG and those that are, are discarded.

Check if the switch is doing VLAN routing or if the XG does. If both do, fix it.
Cancel
Vote Up 0 Vote Down

Cancel
0 Giuseppe R over 5 years ago in reply to LHerzog

Hello LHerzog,

the switch is running in L2 mode and does no routing. The only device for the clients which does routing is the firewall. The issue occurs independently from the vlans.

Could the high availability cluster be an issue in the routing process?
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 Giuseppe R over 5 years ago in reply to LHerzog

Hello LHerzog,

the switch is running in L2 mode and does no routing. The only device for the clients which does routing is the firewall. The issue occurs independently from the vlans.

Could the high availability cluster be an issue in the routing process?
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 LHerzog over 5 years ago in reply to Giuseppe R

Is there something in IPS logs?

I don't think clustering is responsible for this.
Cancel
Vote Up 0 Vote Down

Cancel
0 Giuseppe R over 5 years ago in reply to LHerzog

Also the IPS logs don't show anything regarding this ip address. I noticed two things. The error occurred also over VPN which is configured on the firewall.

The screenshot shows that the packets from 11 seconds earlier where denied because they where invalid. Why is there no port in the listing? Was it blocked right before it could "enter" the interface?
Is it right that the FW Rule id is "N/A" shouldn't it be 0 if no rule where applied?
And from what I am understanding, the rule type gives us informations about the rule type that applies. So the 0 means it's appliance access/invalid traffic.

One reason for this issue could maybe be dropped packets at the interface? I noticed that there were a few of them as you can see:

But this would not explain why only this ip address gets blocked.

Rule Type List:
0 = appliance access/invalid traffic
1 = Network rule
2 = User rule
3 = Business rule

Sorry for the amounts of questions but this issue drives me crazy.
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 5 years ago in reply to Giuseppe R

This still looks to me like connection issues where the firewall does not see all packets comming from this host IP. Like the misconfigured routing I mentioned earlier.

Do you have Web Security enabled? If yes, I would also try to put the IP of your 443 Server into the exclusion list.

You could check this post here with similar issue.

community.sophos.com/.../tcp443-blocked-by-fw-rule-0-could-not-associate-packet-to-any-connection
Cancel
Vote Up 0 Vote Down

Cancel
0 Giuseppe R over 5 years ago in reply to LHerzog

Thank you for your response. I would also think that this is due to misconfigured routing but I cannot imaging on which device.

If one client cannot connect every other client, managed or not, windows, mac, ios or android, or in another vlan cannot connect to this server. Our switches are on L2 mode and don't do any routing. The network is fairly small, there are no other L3 routing devices except the firewall which discards the packets.

Currently the policy is setup but inactive but I added the IP to the exception list. I will test that.

One thing that caught my attention last week was the spanning tree protocol on the switches. We have two switches directly connected to another but only the main switch is directly connected to the firewall. What caught my attention was that our "master" switch, which is directly connected to the firewall, is not the primary in STP. The smaller one is the root device.

Is it possible that some packets get lost due to misconfigured STP? (even though we don't need it in such a small network and could turn it off)

EDIT: I looked at the link but I couldn't find any useful information there

Thanks in advance
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 5 years ago in reply to Giuseppe R

STP can of course be a reason, if it is permanently rebuilding topology. I guess, you would see other major outages as well if you have issues here, not only with this single WAN IP.

I could imagine this could also be something related to insecure ciphers the Web Server is providing as you say, this is only during SSL handshake or insecure Protocols like SSL or TLS 1.0 or so. You can configure the XG to drop or decrpt traffic. But I must admit, I have no experience with this because were still on a v17.5.

https://docs.sophos.com/nsg/sophos-firewall/18.0/releasenotes/en-us/nsg/sfos/releasenotes/rn_SSLTLSInspectionSettings.html

"You can specify the re-signing certificate authorities to sign SSL/TLS server certificates after XG Firewall intercepts, decrypts, and inspects secure traffic. You can specify the settings to drop or reject non-decryptable traffic, which includes insecure protocol versions and occurrences, such as SSL compression and connections that exceed the decryption capabilities of the firewall. "
Cancel
Vote Up 0 Vote Down

Cancel
0 FormerMember over 5 years ago in reply to Giuseppe R

Hi Giuseppe R

The dropped packets at the interface could be because of the bad cable or interface hardware on switch or firewall.

Could you please try to change the cable and let us know how it turns out for you?

Thanks,
Cancel
Vote Up 0 Vote Down

Cancel
0 Giuseppe R over 5 years ago in reply to LHerzog

I disabled the ssl decryption completely and also set the ip on the exclusion list, even if the last step is unnecessary. I also disabled STP on both switches in the network.

Regarding the dropped packets on the interface. I cannot imagine that the cables are faulty. That would mean that 3 cables (they are connected in a LAG) from different manufacturers have a defect.

Since the last screenshot the count even didn't increase and it also couldn't explain why only one ip could not be connected to.

I will observe the changes over the curse of next week and will report if I have anything new.

Thank you all for your help.
Cancel
Vote Up 0 Vote Down

Cancel
0 Giuseppe R over 5 years ago in reply to Giuseppe R

I looked into this and it seems that issue didn't appear any more.

EDIT:

The entries in the firewall log persist and appear periodically but since the last time I didn't received any user feedback about the unreachability of the specific server. So there is no resolution for this issue. I will keep this thread updated if I receive new reports about this issue.
Cancel
Vote Up 0 Vote Down

Cancel