Howdy
Hopefully this information finds / helps some of you.
For the last year we have been dealing with daily outages of our UTMs in multiple locations.
Sophos tech support hasn't been able to find the issue and therefore hasn't been too helpful. We also called in various third party support companies and had no luck there. Sophos in a last ditch effort replaced our hardware free of charge (The hardware they gave us was beaten to hell, so I wasn't happy about that)
Still the issue of super high CPU has continued daily for a year causing the units to lockup and fail every day. There have been a handful of ways to fix this issue including hard reboot, wait it out and hopefully get in via SSH and kill the run away process or use the postgresq192 rebuild trick. None of these are welcome work arounds when you have hundreds of employees trying to use the inet pipes.
Solution!
Today I can report we have found the cause. (at least for our issues) In a word, HTML5 proxy, okay, yes that's two works technically, but it doesn't change the fact that it's the cause of countless hard conversatoins with my bosses.
Apparently when literally one person logs in to the HTML5 portal the resources just start slowly climbing until the unit flat out bricks. We have tested this in both directions. A person OUTSIDE the network using the HTML5 portal to access the internal node and a person INSIDE using the HTML5 portal at a branch location to access a node there. The result is the same in both cases. The resources start climibing and the systems eventually crash.
We have never been able to figure out why this problem is so damn random. It could happen in the middle of the night or a few times per day. We now understand why, people were logging in from home.
So the solution for us is to obviously turn the HTML5 portal off. Maybe this will work for you if you're having a similar issue.
-Jayson
This thread was automatically locked due to age.