This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Strange drops

We have a customer with a phone switchboard application that periodically freezes, either at an application level (can't click anything), or it just won't show incoming calls. In both cases it can sometimes unfreeze, and then all the calls that have come in in the meantime suddenly flash on the screen. We've ruled out AV as the cause and are now looking into the problem being at the network layer.

drop-packet-capture shows this at the time of freezing:

2017-05-23 08:58:14 0101021 IP 10.10.90.2.8779 > 10.10.10.112.43470 : proto TCP: P 3007061919:3007062115(196) win 330 checksum : 55314
0x0000:  4500 00ec 18b4 4000 3f06 a9d2 0a0a 5a02  E.....@.?.....Z.
0x0010:  <remainder of the packet redacted>
Date=2017-05-23 Time=08:58:14 log_id=0101021 log_type=Firewall log_component=Firewall_Rule log_subtype=Denied log_status=N/A log_priority=Alert duration=N/A in_dev=Lag.90 out_dev=Lag.10 inzone_id=1 outzone_id=8 source_mac=00:1a:e8:8b:15:b4 dest_mac=00:e0:20:11:08:fc l3_protocol=IP source_ip=10.10.90.2 dest_ip=10.10.10.112 l4_protocol=TCP source_port=8779 dest_port=43470 fw_rule_id=0 policytype=1 live_userid=0 userid=0 user_gp=0 ips_id=0 sslvpn_id=0 web_filter_id=0 hotspot_id=0 hotspotuser_id=0 hb_src=0 hb_dst=0 dnat_done=0 proxy_flags=0 icap_id=0 app_filter_id=0 app_category_id=0 app_id=0 category_id=0 bandwidth_id=0 up_classid=0 dn_classid=0 source_nat_id=0 cluster_node=0 inmark=0x0 nfqueue=101 scanflags=0 gateway_offset=0 max_session_bytes=0 drop_fix=0 ctflags=33554472 connid=2341170016 masterid=0 status=398 state=3 sent_pkts=N/A recv_pkts=N/A sent_bytes=N/A recv_bytes=N/A tran_src_ip=N/A tran_src_port=N/A tran_dst_ip=N/A tran_dst_port=N/A

then the same again exactly 2 minutes later (even the checksum is the same)

The connection came good another minute later.

Any idea where to look next?

thanks

James



This thread was automatically locked due to age.
Parents
  • James,

    create a Firewall Rule from 10.10.90.2 TCP 8779 to 10.10.10.112 port 43470. Log ID: 0101021 means that traffic is dropped by Firewall.

    Regards

     

  • You can see from the packet that this is not a SYN packet, this is a packet from an established connection that has been dropped for no obvious reason. The connection has not timed out - it is still present in conntrack. In this case, the connection came good again after a bit and the application unfroze. So for some reason XG is deciding that it occasionally doesn't like something about packets in the middle of a connection.

    I have done the following:

    set advanced-firewall bypass-stateful-firewall-config add source_network 10.10.90.0 source_netmask 255.255.255.0 dest_network 10.10.10.0 dest_netmask 255.255.255.0

    set advanced-firewall bypass-stateful-firewall-config add source_network 10.10.10.0 source_netmask 255.255.255.0 dest_network 10.10.90.0 dest_netmask 255.255.255.0

    which disables connection tracking and inspection between the two networks, and the problem has not occurred in the 8ish hours since I put that in place. Previously the problem would have occurred many times in that time.

    I will raise a ticket with Sophos if I get another day of trouble free connectivity. If the customer is in agreement I might try removing those rules and see if the problem returns, just to prove the fix.

  • James, glad to hear you may be seeing some light at the end of the tunnel too :). PM coming to you shortly. I hope this resolves your issue for now until Sophos can figure out how to fix it or Ill just stop using STAS all together. It is going to be very disappointing if Sophos cannot figure out a way to allow STAS for reporting purposes and not block or drop traffic. Let's see what happens. PM me back or post if you get any updates on your case.

    Matthew, I am in the exact same situation. I only use STAS for reporting. I too hope it's a bug and not by design. Hopefully I will hear something back soon from the GES or dev team.

    Mike 

  • Hi MichaelBolton,

    Could you share with us the case# so we may look into it?

  • Anyone have any updates to this issues? Did disabling STAS seem to take care of it for you? From what I am told, the amount of authentication traffic going into the XG is causing a panic condition in the access_server. Development is working to get a fix in for MR5. I can no longer wait. XG is dropping too much traffic that it shouldn't even be looking at. What I was initally told was XG will try to route traffic to a firewall rule if the packet has a "user" tied to it. I have shown Sophos support that mine is still dropping traffic even though no user data is in the packet and the firewall rules are not user based. I really hope they get this fixed soon.  I know you guys are working to fix issues with the development but I am ready to jump ship to another vendor. I have a firewall that can't be used when a feature Sophos advertises is enabled. I am also being told now that in MR4, firewall acceleration is disabled when you run in HA which is FastPath packet optimization. That is another feature that XG was bought for and can't be used and no one even know unless you get someone on the GES team.

    Mike

  • Since turning off STAS with "system auth cta disable" a week ago I haven't had a single reported issue. I even removed the connection tracking bypass rules and things are still good, so even LAN<->LAN traffic is working well, so I think it's safe to say that for my network STAS was definitely causing a problem.

    Running without any user logging isn't ideal, but compared to the alternative it's an acceptable outcome in the short term, but of course when a fix is released we will have to proceed very carefully as I really don't want to see any issues after enabling cta again.

    When you talk about HA disabling firewall acceleration in MR4, are you referring to Active/Active only or is Active/Passive affected too? I'm running the latter.

  • Good to hear. I completely disabled STAS yesterday and did not see any drops either. 

    I am running active-passive as well. Firewall acceleration is disabled on my firewalls and I was told it was done on all HA systems running MR4. I am not clear if it disabled it on existing systems or not. I had to rebuild my HA setup and for sure it disables it if you create HA in MR4. Maybe an upgrade from MR3 will leave it enabled still? They "hope" to have it back with V17. You can check yours by going to console and typing  system firewall-acceleration show

    Mike

  • I just stumbled onto this article.  I am having the EXACT same issue!  I have pounded my head on this for a month!  Just turning off STATS fixes it for now?

     

    So I do this by entering system auth cta disable in the console?

     

    Also I also do not have nay user rules created, if I just created a user rule would this help or did you still experience issues?

  • Hi April,

    It appears from my testing and James' testing, just disabling STAS fixes the issue for now. 

     

    Yes, that is the command from the console to disable STAS.

     

    In my testing, even when creating user based rules, the XG would still drop packets. Some days it worked fine, some days it did not. I just decided to disable STAS completely yesterday as my users could not deal with the issue any longer.

  • I am so glad to stumble upon this thread.  You are describing my issue to a T!  My users were ready to throw IT out the window and were demanding we get rid of the XG firewall.  We also had days where it worked perfectly such as two days this past week.

    I have also disabled it.

    I am severely frustrated with support.  The reasons I have gotten are: not enough bandwidth, cable bad, switch bad, network issues and just flat out it isn't our problem but yours.

    Thank you gentlemen!

  • FWIW, firewall acceleration is enabled on the AP HA setup I've been working with:

    console> system firewall-acceleration show
    Firewall Acceleration is Enabled.

Reply Children
No Data