This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

100% CPU load on SG230 ouf of nowhere - What could have caused it?

Hey guys,

I'd like to ask you for help on an incident we had today with our SG230. We switched from a WatchGuard to Sophos about 4 weeks ago and use most of the features, like Email-, Web- and Webserverprotection.

We normally see a CPU utilization of 10-25% with very few peaks up to 80-90 %. Today, around 11:00, we suddenly had a constant CPU utilization of 100%, which caused most of the inbound and outbound connections to time out. We then did a HA-failover, that cleared things up for 10 minutes, when the problems started coming back. With another HA-failover everything is back to normal.

I don't see anything unsual in the feature individual logs, we only had a significant peak in the number of spam mails (we are talking about ~50 Spam mails).
I can't believe, that this number of spam mails might have caused such a high CPU load...

Here's our monthly email statistics:

Here are our CPU statistics for today:

So, we are looking for a way to toubleshoot todays incident. I found a knowledgebase article at Sophos about how to identify the top 20 CPU consuming processes: https://www.sophos.com/de-de/support/knowledgebase/115767.aspx

As we are now back to normal, that does not help me now. Is there any way to clearly identify, which process / feature went berserk this morning?

Firmware version is: 9.402-7

Thanks in advance for any advice and best regards from Germany



This thread was automatically locked due to age.
Parents
  • Hi, Sascha, and welcome to the UTM Community!

    I wonder if this isn't the same mistake I made many years ago when first learning this tool (UTM now, ASG then).  By any chance do you have "External (Network)" in 'Local Networks' on the 'Global' tab of 'Network Protection >> Intrusion Prevention'?

    Cheers - Bob

  • BAlfson said:

    Hi, Sascha, and welcome to the UTM Community!

    I wonder if this isn't the same mistake I made many years ago when first learning this tool (UTM now, ASG then).  By any chance do you have "External (Network)" in 'Local Networks' on the 'Global' tab of 'Network Protection >> Intrusion Prevention'?

    Cheers - Bob

    Good morning Bob,
    thank you for your warm welcome and your suggestion.

    No, we don't have "External network" configured in the local network config of the IPS. The config there seems to be ok. First thing I disabled when the UTM was running on 100% CPU was IPS, because that was the first thing that came up to my mind. However, this had no immediate effect...

    Second suspect were the double AV checks we had activated for Email and Web-Proxy. We disabled them yesterday, but I'm not a big fan of "try & error".

    Is there any chance to find out exactly, which process caused that high load yesterday?

    Thanks ;)
    Sascha

    Today, everything is back to normal.

  • The AV issue was my next guess.  Check the Up2Date log for yesterday to see if you got an AV pattern update just before the 100% CPU problem, and, if so, which one.

    There's some bug somewhere, and I think it's in cssd.  On different boxes, doing single scan with Avira causes a  problem and on others Sophos does.  Some boxes have a problem with dual scan, so you just have to try all three ways.  In my experience, watching top and the Up2Date Live Log (to be sure you've gotten a pattern update) gives instant feedback.

  • BAlfson said:

    The AV issue was my next guess.  Check the Up2Date log for yesterday to see if you got an AV pattern update just before the 100% CPU problem, and, if so, which one.

    There's some bug somewhere, and I think it's in cssd.  On different boxes, doing single scan with Avira causes a  problem and on others Sophos does.  Some boxes have a problem with dual scan, so you just have to try all three ways.  In my experience, watching top and the Up2Date Live Log (to be sure you've gotten a pattern update) gives instant feedback.

    Hi Bob,
    there were several updates running at that time. Here's a snippet from our up2date-log at the time the problem started:


    2016:06:16-11:23:59 Hostname-X auisys[4923]: No suitable packages of type <man9> found, skipping<30>Jun 16 11:23:59 auisys[4923]: No suitable packages of type <cadata> found, skipping
    2016:06:16-11:23:59 Hostname-X auisys[4923]: Install u2d packages <ohelp9>
    2016:06:16-11:23:59 Hostname-X auisys[4923]: Starting installing up2date packages for type 'ohelp9'
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Installing up2date package: /var/up2date/ohelp9/u2d-ohelp9-9.126-127.patch.tgz.gpg
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Verifying up2date package signature
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Unpacking installation instructions
    2016:06:16-11:24:00 Hostname-X auisys[4923]: parsing installation instructions
    2016:06:16-11:24:00 Hostname-X auisys[4923]: id="371B" severity="info" sys="system" sub="up2date" name="up2date package is already installed, removing" status="failed" file="/var/up2date/ohelp9/u2d-ohelp9-9.126-127.patch.tgz.gpg" action="preinst_check" package="ohelp9"
    2016:06:16-11:24:00 Hostname-X auisys[4923]: id="371Z" severity="info" sys="system" sub="up2date" name="Successfully installed Up2Date package" status="success" action="install" package_version="9.127" package="ohelp9"
    2016:06:16-11:24:00 Hostname-X auisys[4923]: [INFO-306] New Pattern Up2Dates installed
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Install u2d packages <aptp>
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Starting installing up2date packages for type 'aptp'
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Installing up2date package: /var/up2date/aptp/u2d-aptp-9.16120-16121.patch.tgz.gpg
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Verifying up2date package signature
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Unpacking installation instructions
    2016:06:16-11:24:00 Hostname-X auisys[4923]: parsing installation instructions
    2016:06:16-11:24:00 Hostname-X auisys[4923]: id="371B" severity="info" sys="system" sub="up2date" name="up2date package is already installed, removing" status="failed" file="/var/up2date/aptp/u2d-aptp-9.16120-16121.patch.tgz.gpg" action="preinst_check" package="aptp"
    2016:06:16-11:24:00 Hostname-X auisys[4923]: id="371Z" severity="info" sys="system" sub="up2date" name="Successfully installed Up2Date package" status="success" action="install" package_version="9.16121" package="aptp"
    2016:06:16-11:24:00 Hostname-X auisys[4923]: [INFO-306] New Pattern Up2Dates installed
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Install u2d packages <avira-xvdf>
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Starting installing up2date packages for type 'avira-xvdf'
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Installing up2date package: /var/up2date/avira-xvdf/u2d-avira-xvdf-9.1743-1744.patch.tgz.gpg
    2016:06:16-11:24:00 Hostname-X auisys[4923]: Verifying up2date package signature
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Unpacking installation instructions
    2016:06:16-11:24:01 Hostname-X auisys[4923]: parsing installation instructions
    2016:06:16-11:24:01 Hostname-X auisys[4923]: id="371B" severity="info" sys="system" sub="up2date" name="up2date package is already installed, removing" status="failed" file="/var/up2date/avira-xvdf/u2d-avira-xvdf-9.1743-1744.patch.tgz.gpg" action="preinst_check" package="avira-xvdf"
    2016:06:16-11:24:01 Hostname-X auisys[4923]: id="371Z" severity="info" sys="system" sub="up2date" name="Successfully installed Up2Date package" status="success" action="install" package_version="9.1744" package="avira-xvdf"
    2016:06:16-11:24:01 Hostname-X auisys[4923]: [INFO-306] New Pattern Up2Dates installed
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Install u2d packages <geoip>
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Starting installing up2date packages for type 'geoip'
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Installing up2date package: /var/up2date/geoip/u2d-geoip-7.110.tgz.gpg
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Verifying up2date package signature
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Unpacking installation instructions
    2016:06:16-11:24:01 Hostname-X auisys[4923]: parsing installation instructions
    2016:06:16-11:24:01 Hostname-X auisys[4923]: id="371B" severity="info" sys="system" sub="up2date" name="up2date package is already installed, removing" status="failed" file="/var/up2date/geoip/u2d-geoip-7.110.tgz.gpg" action="preinst_check" package="geoip"
    2016:06:16-11:24:01 Hostname-X auisys[4923]: id="371Z" severity="info" sys="system" sub="up2date" name="Successfully installed Up2Date package" status="success" action="install" package_version="7.110" package="geoip"
    2016:06:16-11:24:01 Hostname-X auisys[4923]: [INFO-306] New Pattern Up2Dates installed
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Install u2d packages <ipsbundle>
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Starting installing up2date packages for type 'ipsbundle'
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Installing up2date package: /var/up2date/ipsbundle/u2d-ipsbundle-9.234.tgz.gpg
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Verifying up2date package signature
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Unpacking installation instructions
    2016:06:16-11:24:01 Hostname-X auisys[4923]: parsing installation instructions
    2016:06:16-11:24:01 Hostname-X auisys[4923]: id="371B" severity="info" sys="system" sub="up2date" name="up2date package is already installed, removing" status="failed" file="/var/up2date/ipsbundle/u2d-ipsbundle-9.234.tgz.gpg" action="preinst_check" package="ipsbundle"
    2016:06:16-11:24:01 Hostname-X auisys[4923]: id="371Z" severity="info" sys="system" sub="up2date" name="Successfully installed Up2Date package" status="success" action="install" package_version="9.234" package="ipsbundle"
    2016:06:16-11:24:01 Hostname-X auisys[4923]: [INFO-306] New Pattern Up2Dates installed
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Install u2d packages <savi>
    2016:06:16-11:24:01 Hostname-X auisys[4923]: Starting installing up2date packages for type 'savi'
    2016:06:16-11:24:02 Hostname-X auisys[4923]: Installing up2date package: /var/up2date/savi/u2d-savi-9.9317-9318.patch.tgz.gpg
    2016:06:16-11:24:02 Hostname-X auisys[4923]: Verifying up2date package signature
    2016:06:16-11:24:02 Hostname-X auisys[4923]: Unpacking installation instructions
    2016:06:16-11:24:02 Hostname-X auisys[4923]: parsing installation instructions
    2016:06:16-11:24:02 Hostname-X auisys[4923]: id="371B" severity="info" sys="system" sub="up2date" name="up2date package is already installed, removing" status="failed" file="/var/up2date/savi/u2d-savi-9.9317-9318.patch.tgz.gpg" action="preinst_check" package="savi"
    2016:06:16-11:24:02 Hostname-X auisys[4923]: id="371Z" severity="info" sys="system" sub="up2date" name="Successfully installed Up2Date package" status="success" action="install" package_version="9.9318" package="savi"
    2016:06:16-11:24:02 Hostname-X auisys[4923]: [INFO-306] New Pattern Up2Dates installed
    2016:06:16-11:24:03 Hostname-X auisys[4923]: Up2Date Package Installer finished, exiting
    2016:06:16-11:24:03 Hostname-X auisys[4923]: id="3716" severity="info" sys="system" sub="up2date" name="Up2Date Package Installer finished, exiting"

    Is there anything unsual which might explain the 100% load?

    Thanks again! ;)

    Sascha

  • If you notice it again, SSH into the system and run "top". This will only work of course while the issue is occurring , but it will tell you what is going on.

  • There were Up2Dates for both avira and savi patterns, so no clues there.  I would generate an atop file and then compare it to the Up2Date log and the Daily CPU graph:

    1. As root at the command line, add the following two lines to /etc/crontab-static first thing in the morning.

      20 11 * * * root /sbin/audld.plx --nosys --trigger
      20 11 * * * root atop -w /home/atop.log 1 600

    2. In WebAdmin, set 'Pattern Update Interval' to "Manual." This will cause /etc/crontab to be written with the two lines above.  Now, the UTM is set to run a pattern update only once a day, at 11:20AM.  Simultaneous with the beginning of this update, atop will begin recording performance information every second, and this will go on for ten minutes.
    3. Sometime after 11:40AM, comment out the two lines in /ect/crontab-static by putting # at the beginning of each and then set the pattern update back to desired in WebAdmin.
    4. Pull up the CPU graph. If you didn't hit 100% run rm /home/atop.log.
    5. If you hit 100%, pull up the 2Date log to see what was happening during the 100% and, at the command line run atop -r /home/atop.log.  Step through the log using t to go forward a second and T to go backward.

    Anything interesting learned?  If not, you may need to leave pattern updates at once daily for a day or two.

    Cheers - Bob

  • BAlfson said:

    There were Up2Dates for both avira and savi patterns, so no clues there.  I would generate an atop file and then compare it to the Up2Date log and the Daily CPU graph:

    1. As root at the command line, add the following two lines to /etc/crontab-static first thing in the morning.

      20 11 * * * root /sbin/audld.plx --nosys --trigger
      20 11 * * * root atop -w /home/atop.log 1 600

    2. In WebAdmin, set 'Pattern Update Interval' to "Manual." This will cause /etc/crontab to be written with the two lines above.  Now, the UTM is set to run a pattern update only once a day, at 11:20AM.  Simultaneous with the beginning of this update, atop will begin recording performance information every second, and this will go on for ten minutes.
    3. Sometime after 11:40AM, comment out the two lines in /ect/crontab-static by putting # at the beginning of each and then set the pattern update back to desired in WebAdmin.
    4. Pull up the CPU graph. If you didn't hit 100% run rm /home/atop.log.
    5. If you hit 100%, pull up the 2Date log to see what was happening during the 100% and, at the command line run atop -r /home/atop.log.  Step through the log using t to go forward a second and T to go backward.

    Anything interesting learned?  If not, you may need to leave pattern updates at once daily for a day or two.

    Cheers - Bob

    Thanks Bob for your very detailed instructions. We will do so and see if we find anything interesting. Currently the UTM is workin normally, even during updates.

    Cheers
    Sascha

Reply
  • BAlfson said:

    There were Up2Dates for both avira and savi patterns, so no clues there.  I would generate an atop file and then compare it to the Up2Date log and the Daily CPU graph:

    1. As root at the command line, add the following two lines to /etc/crontab-static first thing in the morning.

      20 11 * * * root /sbin/audld.plx --nosys --trigger
      20 11 * * * root atop -w /home/atop.log 1 600

    2. In WebAdmin, set 'Pattern Update Interval' to "Manual." This will cause /etc/crontab to be written with the two lines above.  Now, the UTM is set to run a pattern update only once a day, at 11:20AM.  Simultaneous with the beginning of this update, atop will begin recording performance information every second, and this will go on for ten minutes.
    3. Sometime after 11:40AM, comment out the two lines in /ect/crontab-static by putting # at the beginning of each and then set the pattern update back to desired in WebAdmin.
    4. Pull up the CPU graph. If you didn't hit 100% run rm /home/atop.log.
    5. If you hit 100%, pull up the 2Date log to see what was happening during the 100% and, at the command line run atop -r /home/atop.log.  Step through the log using t to go forward a second and T to go backward.

    Anything interesting learned?  If not, you may need to leave pattern updates at once daily for a day or two.

    Cheers - Bob

    Thanks Bob for your very detailed instructions. We will do so and see if we find anything interesting. Currently the UTM is workin normally, even during updates.

    Cheers
    Sascha

Children
No Data