This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Strange drops

We have a customer with a phone switchboard application that periodically freezes, either at an application level (can't click anything), or it just won't show incoming calls. In both cases it can sometimes unfreeze, and then all the calls that have come in in the meantime suddenly flash on the screen. We've ruled out AV as the cause and are now looking into the problem being at the network layer.

drop-packet-capture shows this at the time of freezing:

2017-05-23 08:58:14 0101021 IP 10.10.90.2.8779 > 10.10.10.112.43470 : proto TCP: P 3007061919:3007062115(196) win 330 checksum : 55314
0x0000:  4500 00ec 18b4 4000 3f06 a9d2 0a0a 5a02  E.....@.?.....Z.
0x0010:  <remainder of the packet redacted>
Date=2017-05-23 Time=08:58:14 log_id=0101021 log_type=Firewall log_component=Firewall_Rule log_subtype=Denied log_status=N/A log_priority=Alert duration=N/A in_dev=Lag.90 out_dev=Lag.10 inzone_id=1 outzone_id=8 source_mac=00:1a:e8:8b:15:b4 dest_mac=00:e0:20:11:08:fc l3_protocol=IP source_ip=10.10.90.2 dest_ip=10.10.10.112 l4_protocol=TCP source_port=8779 dest_port=43470 fw_rule_id=0 policytype=1 live_userid=0 userid=0 user_gp=0 ips_id=0 sslvpn_id=0 web_filter_id=0 hotspot_id=0 hotspotuser_id=0 hb_src=0 hb_dst=0 dnat_done=0 proxy_flags=0 icap_id=0 app_filter_id=0 app_category_id=0 app_id=0 category_id=0 bandwidth_id=0 up_classid=0 dn_classid=0 source_nat_id=0 cluster_node=0 inmark=0x0 nfqueue=101 scanflags=0 gateway_offset=0 max_session_bytes=0 drop_fix=0 ctflags=33554472 connid=2341170016 masterid=0 status=398 state=3 sent_pkts=N/A recv_pkts=N/A sent_bytes=N/A recv_bytes=N/A tran_src_ip=N/A tran_src_port=N/A tran_dst_ip=N/A tran_dst_port=N/A

then the same again exactly 2 minutes later (even the checksum is the same)

The connection came good another minute later.

Any idea where to look next?

thanks

James



This thread was automatically locked due to age.
Parents
  • Hi,

    I've just found this article as I'm suffering the exact same issue with STAT on our XG310 running the latest 17.1 firmware. So it seems the STAT issue is still existing and hasn't been fixed.

    Like others here, I have no User based rules but have been getting strange drop outs of connections and turning STAT off fixes them.

     

    However we had STAT switched on so that the XG could identify users and assist with tracking traffic to users and without STAT turned on the UTQ doesn't appear to report anything.

    Is there another option I can switch on to identify users and get the UTQ working, without causing the dropouts?

     

    I have also got a support ticket opened #8082675 (since a few days after we purchased the device, and nobody in support seemed to be able to help or know about the issue!)

     

    Matthew

  • Hi Matthew,

    The issue still exists, although on our unit it is not as bad as it was after setting the system auth cta unauth-traffic drop-period to 0. Still though, some of our traffic goes through a layer 3 switch for routing to prevent this from causing issues with intervlan routing. From what I am told from product management, in 17.2 there will be the option to turn off the learning period completely which will prevent this from happening. I was originally told maybe July for 17.2 but with 17.1 being so delayed, I bet we won't see 17.2 until October.

    Mike

  • Hi,

    Short question - i am not able to read everything sorry - Which facilities do you need with authentication, if you not using userbased policys?  

    I am aware of such a issue but most of the time, i am able to disable the STAS for this XG, because - as mentioned before - no authentication is needed. 

    Cheers

  • Not sure if people are still seeing this. We sure are. After a whole lot of digging I found that for us, in the troubleshooting tools on the Advanced tab of the STAS suite application when I tested the IPs of users seeing problems/drops I would get an error. Either RPC server unavailable or similar notification. After a crash course in WMI I found that our PTR records in our DNS servers were not updating and that was causing trouble with the WMI verification. I am in the process of correcting and updating our PTR records in DNS and turning STAS back on at our smaller sites to test things out. So I am not sure if this is the fix for us yet but I wanted to put it out there if it helps others. 

Reply
  • Not sure if people are still seeing this. We sure are. After a whole lot of digging I found that for us, in the troubleshooting tools on the Advanced tab of the STAS suite application when I tested the IPs of users seeing problems/drops I would get an error. Either RPC server unavailable or similar notification. After a crash course in WMI I found that our PTR records in our DNS servers were not updating and that was causing trouble with the WMI verification. I am in the process of correcting and updating our PTR records in DNS and turning STAS back on at our smaller sites to test things out. So I am not sure if this is the fix for us yet but I wanted to put it out there if it helps others. 

Children
  • I ran into this issue as well, disabling STAS seems to resolve. 

  • We are having issues with our XG as well. Performance gets worse over time. Especially TLS handshakes take long time or fail for an amount of time so pages fail to load.

    Disabled STAS today to see if anything changes. Might be the reason for our trouble.

  • After disabling STAS our performance issues disappaered immidiately. Thanks for this hint towards STAS.

  • Hi Jelle,

    What version of the firmware are you running on the XG firewall?

    Since we updated to 17.1.3 we have not had anything like as many issues with STAS turned on as previously.

    So we have currently got it turned back on so that we can still get the User Threat Quotient reports. Even though currently we do not have any policies that will limit users, we would look to implement them if the UTQ report highlights problem users.

    I don't feel that just turning off features of a device that we have purchased is an acceptable way of working just because the feature contains issues. They should be fixed so that the features that were published when purchasing the device can actually be used!

    It's like buying a house with automatic user identification for access through the front door, only to find it doesn't work and you have to get extra keys cut to do a more complex workaround!

    I still want to be able to use the simple to set up STAS feature and not the more complex alternatives. Otherwise should STAS be removed from the feature list until it works?

    Matthew

  • Hi Matthew,

    Of course this is not acceptable, but it brought back the ability to use the internet without permanent issues. I'm still waiting for an answer from Sophos regarding our issue and informed them in the meantime that STAS was the real problem behind it.

    We currently have 17.1.1 installed and didn't install MR2 or MR3 as there were too many issues with these releases.

    Waiting for MR4 to appear on our XG. Will then check the behaviour of STAS. But I think and hope that Sophos Central integration in 17.5 offers the possibility to remove STAS.

  • Mathew or Jelle, do you have several specific users that have consistent trouble and others that seem unaffected or everybody has pretty consistent trouble when STAS is on? We are running 17.1.3MR-3. We have a fair number of users that have network drops and some user that are seemingly unaffected. That obviously doesn't fly very well so STAS turned off at the moment.

    What result do you get if you run the STAS polling utilities tests on the advanced tab of the STAS tools if you put in an IP of a user that is having trouble? STAS does not need to be active on the firewall to run the tests. It just does a couple checks against the client computer. 

  • Seems that some users had more issues but that might be related to the usage. For example users working a lot with Teamviewer or VPN had of course more trouble than users just browsing the internet now and then. But I'm gonna check as soon as possible.

  • We have STAS successfully working on one firewall.  We have three others that we could not get it working without drops and had to turn it off.  

    The working firewall I logged multiple calls to Sophos support and it was a nightmare to get working.  The others I did not have the time nor patience to work with Sophos to get it working.

    Our users with VPNs and on connections such as Teamviewer were the ones who noticed it as they would get booted off when the connection dropped. Sporadic internet users noticed it occasionally.

    Our current version is 17.1.3 MR-3.

    I hope the issue gets resolved soon too!

  • Had some more progress on the issue. One of our offices that had consistent issues with the XG periodically blocking traffic from only certain machines had 0 issues yesterday. It looks like the trick for us ( I hope, still verifying)  was ensuring that the netlogon service was running on the client machines. Apparently at some point our install image had some settings changed and netlogon was set to manual instead of automatic. As soon as the service was started on a machine giving us grief the issue went away. Not entirely sure of all the hows and whys in this case but that what seems to be working at the moment. So we set a group policy to ensure the service is running and automatic and turning STAS back on in all offices. 

     

    Hope this helps.