Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XGS Stopped Working (18.5.3)

(2) Sophos XGS4500 (SFOS 18.5.3 MR-3-Build408) HA

I was wondering if anyone has seen this issue.
Yesterday our XGS just stopped passing traffic (nothing would go through). The XGS was accessible internal (web interface and ssh) and was able to communicate to external. Just nothing would pass through and no errors anywhere I could find.

Forcing  a failover to the auxiliary XGS solved the issue and everything came back up. Switching back to the primary also everything is now fine.

The XGS has been solid for months with no issues.
The only change that has been made to the XGS was the update to 18.5.3 last week.

I am now a little uneasy about the stability of 18.5.3 and thinking of rolling back to 18.5.2.
Wanted to see if anyone else has seen this issue or if it was just a fluke.

Thank You,
-Peter Mastrangelo



This thread was automatically locked due to age.
Parents
  • We have an open case with Sophos regarding this (Case# 05098782-050987). This has now been escalated all the way through global escalation specialists (GES) to Development (basically as high as it gets) - Development reference number: NC-92066

    We've had the issue with 18.5.2, 18.5.3 and 19.0

    Pair of XGS 116 in HA. Primary stops passing traffic and HA fails over to Auxillary. If you reboot the failed XGS, the HA is restored. If you don't reboot the failed XGS, the new Primary will eventually fail, leaving you with no internet connectivity.

    I would suggest those affected open a case with Sophos and reference our case number and the development reference number. If you already have a case I would suggest you pass our case details on to your current support specialist.

  • Hello everyone, does anyone already have any feedback, with us the support still analyzes the problem...

  • With us there was the service, analysis and no assertive diagnosis. We are currently operating on secondary equipment that does not have the problem. when possible, we will leave the problematic equipment operating and put a monitoring system to collect logs in an attempt to find the reason for the traffic stop.

  • We are logging a permanent serial console connection to both XGS to see if they can capture a kernel dump when the XGS fails. Waiting for a failure now.

  • Thank you for your feedback, since it affects both firewalls from the HA, we will set up a new firewall and test whether it also occurs there

  • has everyone here XGS machines or also XG machines? I wonder it is an issue with the co-processor.

  • In our case XG, but I read that there are problems with XGS too.

  • OK, so it is mixed hardware. Too bad.

    I remember we had an issue where a specific config change caused XG HA to become unresponsive and at some point did not pass traffic until the HA Aux (yes, the slave node!) node was rebooted.

    Is it possible that this issue begins with any config change? Check Admin audit log for changes and then HA logs if they match together.

    in our case we could see this starting when we did the config change:



    ==> /log/ha_tunnel.log <==
    Mar 02 18:16:40 ssh: connect to host hapeer port 22: Connection refused

    Mar 02 18:16:41 ssh: connect to host hapeer port 22: Connection refused

    Mar 02 18:16:42 ssh: connect to host hapeer port 22: Connection refused

    Mar 02 18:16:47 ssh: connect to host hapeer port 22: Connection timed out

    Mar 02 18:16:52 ssh: connect to host hapeer port 22: Connection timed out

    Mar 02 18:16:57 ssh: connect to host hapeer port 22: Connection timed out

Reply
  • OK, so it is mixed hardware. Too bad.

    I remember we had an issue where a specific config change caused XG HA to become unresponsive and at some point did not pass traffic until the HA Aux (yes, the slave node!) node was rebooted.

    Is it possible that this issue begins with any config change? Check Admin audit log for changes and then HA logs if they match together.

    in our case we could see this starting when we did the config change:



    ==> /log/ha_tunnel.log <==
    Mar 02 18:16:40 ssh: connect to host hapeer port 22: Connection refused

    Mar 02 18:16:41 ssh: connect to host hapeer port 22: Connection refused

    Mar 02 18:16:42 ssh: connect to host hapeer port 22: Connection refused

    Mar 02 18:16:47 ssh: connect to host hapeer port 22: Connection timed out

    Mar 02 18:16:52 ssh: connect to host hapeer port 22: Connection timed out

    Mar 02 18:16:57 ssh: connect to host hapeer port 22: Connection timed out

Children
No Data
Share Feedback
×

Submitted a Tech Support Case lately from the Support Portal?