This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

After Upgrade to 9.100-16 CPU load is 100%

Hi,

i use UTM 9 at home and i tried to update with Up2Date. The base is ESXi 5.1 and the UTM 9 is a virtual machine on it.

After Upgrade to 9.100-16 the CPU load is 100% for hours. After reboot the same problem.

Before the update i take a snapshot and after reset to this snapshot (UTM 9.006-5) everything is ok.

Is there any solution for the problem with 9.100-16?

Thanks for feedback.

Greetings
Erwin

This thread was automatically locked due to age.

0 jeff.welling over 12 years ago

When you're running 9.1, enable SSH Access, SSH in, become root, and run `top`. What process is using the CPU the most? Watch top while you also have the dashboard open. If the dashboard and top disagree, consider top correct.
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 12 years ago

Whenever I hear a complaint about 100% CPU, I suspect a broken PostgreSQL database. If a look at top confirms that, run, as root:
/etc/init.d/postgresql rebuild

If that doesn't work, run the command with postgresql92 instead of just postgresql.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel

0 Ekurzi17 over 12 years ago in reply to BAlfson

I tried how recommended by BOB:

root # /etc/init.d/postgresql92 rebuild

But after reboot, the same problem exists.

The CPU utilisation for this UTM-VM is also shown in vSphere client.

TOP shows:

TOP:
top - 20:19:18 up 4 min,  1 user,  load average: 3.58, 2.30, 0.97
Tasks: 137 total,   5 running, 130 sleeping,   0 stopped,   2 zombie
Cpu(s): 30.1%us, 69.4%sy,  0.0%ni,  0.5%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1541940k total,  1188696k used,   353244k free,    53196k buffers
Swap:  1048572k total,        0k used,  1048572k free,   413904k cached

  PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+  COMMAND
 2962 root      20   0  6868 3124 1736 R     33  0.2   0:41.75 syslog-ng
 4010 postgres  20   0  560m  36m  36m S     10  2.4   0:13.61 postgres
32295 root      20   0     0    0    0 Z     10  0.0   0:00.31 confd.p 
15446 root      20   0 46952  31m 1988 S      4  2.1   0:02.63 confd.plx
10748 wwwrun    20   0 67104  63m 7728 S      2  4.2   0:03.84 webadmin.plx
 3620 root      20   0 44484  28m 1360 S      1  1.9   0:00.92 confd.plx
   10 root      20   0     0    0    0 R      0  0.0   0:01.26 rcu_sched
 3331 root      20   0  6252 4664  420 S      0  0.3   0:01.25 haveged
 3407 root      20   0  8052 5976 1988 S      0  0.4   0:00.11 confd-qrunner.p
 4119 root      20   0 11576 9744 2488 S      0  0.6   0:02.22 selfmonng.plx
16300 wwwrun    20   0 11088 4176 2356 S      0  0.3   0:00.03 httpd
    1 root      20   0  1912  592  520 S      0  0.0   0:01.27 init
    2 root      20   0     0    0    0 S      0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S      0  0.0   0:00.03 ksoftirqd/0

0 jeff.welling over 12 years ago

Thankfully the email preserved the formatting.

So we can see syslog-ng using some CPU in that top snapshot, but if you keep watching it do you continue to see syslog-ng using CPU? Does anything else? Watch the CPU column and add up the numbers as you see them them update to get an idea of if you're even coming close to using 100% cpu, because in the snapshot you showed us, you're not, so I think the Dashboard is simply inaccurate.
Cancel
Vote Up 0 Vote Down

Cancel
0 Ekurzi17 over 12 years ago in reply to jeff.welling

But - why does the vSphere client show the cpu utilisation nearly 100% of this virtual machine.

Before i have done the update (or after gone back to Firmware version: 9.006-5 by reseting machine to a former snapshot of the vm) the cpu utilisation is nearly 15% of the vm.

TOP before update:

[FONT="Courier New"]top - 21:11:45 up  1:54,  1 user,  load average: 0.19, 0.34, 0.78
Tasks: 128 total,   1 running, 125 sleeping,   0 stopped,   2 zombie
Cpu(s):  0.3%us,  0.3%sy,  0.0%ni, 99.3%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   1542220k total,  1219348k used,   322872k free,    93480k buffers
Swap:  1048572k total,        0k used,  1048572k free,   422580k cached

  PID USER      PR  NI  VIRT  RES  SHR S   %CPU %MEM    TIME+  COMMAND
17625 root      20   0 11544 9636 2468 S      1  0.6   0:04.22 selfmonng.plx
18545 httpprox  20   0 1072m 158m 7028 S      0 10.5   0:08.85 httpproxy
19837 root      20   0     0    0    0 S      0  0.0   0:00.02 kworker/1:1
20111 root      20   0  2696 1100  816 R      0  0.1   0:00.06 top
    1 root      20   0  1912  592  520 S      0  0.0   0:02.41 init
    2 root      20   0     0    0    0 S      0  0.0   0:00.06 kthreadd
    3 root      20   0     0    0    0 S      0  0.0   0:00.02 ksoftirqd/0
    5 root      20   0     0    0    0 S      0  0.0   0:04.20 kworker/u:0
    6 root      RT   0     0    0    0 S      0  0.0   0:00.00 migration/0
    7 root      RT   0     0    0    0 S      0  0.0   0:00.00 migration/1
    9 root      20   0     0    0    0 S      0  0.0   0:00.03 ksoftirqd/1
   11 root       0 -20     0    0    0 S      0  0.0   0:00.00 khelper
   12 root      20   0     0    0    0 S      0  0.0   0:00.01 kworker/u:1
  100 root      20   0     0    0    0 S      0  0.0   0:00.28 sync_supers
  102 root      20   0     0    0    0 S      0  0.0   0:00.00 bdi-default
  104 root       0 -20     0    0    0 S      0  0.0   0:00.00 kblockd
  267 root      20   0     0    0    0 S      0  0.0   0:00.00 khubd
[/FONT]
Cancel
Vote Up 0 Vote Down

Cancel
0 jeff.welling over 12 years ago

This I have no explanation for. If you have Support with Sophos then you may want to open a support ticket to dig in to that further.
Cancel
Vote Up 0 Vote Down

Cancel
0 SteveU over 12 years ago

I may have seen this same problem. Do you see several entries in /var/log/system.log from syslog-ng stating POLLERR, or something similar?

I was also getting errors when I would try to check the usage of service and network objects.

I run an active/passive cluster, and was only seeing the problem on one node. To fix it, I destroyed the offending node, installed a new one from the 9.1 ISO and rejoined to the cluster.
Cancel
Vote Up 0 Vote Down

Cancel
0 BarryG over 12 years ago

Hi, check to see if any of the logs are growing very fast; that could explain the high syslog CPU usage; then you'd need to look at those logs and see what is causing them to grow.

Barry
Cancel
Vote Up 0 Vote Down

Cancel
0 paulba2k over 12 years ago in reply to SteveU

I may have seen this same problem. Do you see several entries in /var/log/system.log from syslog-ng stating POLLERR, or something similar?

I was also getting errors when I would try to check the usage of service and network objects.

I run an active/passive cluster, and was only seeing the problem on one node. To fix it, I destroyed the offending node, installed a new one from the 9.1 ISO and rejoined to the cluster.

Found the same problem, Upgraded two HA (Master/Slave) Virtual Machines to 9.1 and ended up with the master running at 99% cpu, syslog-ng taking up about 30% (although the rest did not add up to 99 but it agreed with VMware). moved to the slave into master, deleted the old master and rebuilt from scratch using a 9.003 ovf. took a while to get everything back to normal but I was able to do it on a live system with no complaints from the users (nothing new anyway).

Paul
Cancel
Vote Up 0 Vote Down

Cancel
0 wiseguy over 12 years ago
Hallo,

i have the same Problem:
After Update from Version 9.003... (I can't tell it exactly) to 9.100-16 The CPU ist permanent on 100%. The Data and Log Part is nearly full. I identified the syslog with 900MB so I delete it. After that this Log grow up realy fast (maybe 100MB in 5 Minutes!). The only Entry what you can see there now is:
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='91'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='86'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='91'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='84'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='86'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='91'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='86'
2013:05:20-22:33:09 EF-RT-ASG syslog-ng[2866]: POLLERR occurred while idle; fd='84'
(thats of course only a little part ;-))

I deaktivatet the Logging Function for all on that UTM. After that the cpu is going to normal for a short time. Then it gows to 100% again (without Logging now).

Whenever I hear a complaint about 100% CPU, I suspect a broken PostgreSQL database. If a look at top confirms that, run, as root:
/etc/init.d/postgresql rebuild
If that doesn't work, run the command with postgresql92 instead of just postgresql.

I tryed that with postgresql92, but that has no effect I think.

Hi, check to see if any of the logs are growing very fast; that could explain the high syslog CPU usage; then you'd need to look at those logs and see what is causing them to grow.

How can I deactivate a special Log (in my case only syslog)?

I hope enyone can help me. Otherwise I had to install all new next week.
Cancel
Vote Up 0 Vote Down

Cancel