This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

After Upgrade to 9.100-16 CPU load is 100%

Hi,

i use UTM 9 at home and i tried to update with Up2Date. The base is ESXi 5.1 and the UTM 9 is a virtual machine on it.

After Upgrade to 9.100-16 the CPU load is 100% for hours. After reboot the same problem.

Before the update i take a snapshot and after reset to this snapshot (UTM 9.006-5) everything is ok.

Is there any solution for the problem with 9.100-16?

Thanks for feedback.

Greetings
Erwin

This thread was automatically locked due to age.

0 BAlfson over 13 years ago

Other users that reported "POLLERR" in the log had to re-install from ISO.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel

0 RChadwick over 13 years ago

I have the same problem. I just reinstalled from the 9.1 ISO, restored the config file, and I still have 100% CPU.
I finally managed to SSH in, and ran top. It looks like kworker is the problem.

Any ideas how to fix this? I'm worried my fanless hardware is going to cook itself!

top - 11:36:01 up 8 min,  2 users,  load average: 4.66, 3.36, 1.66
Tasks: 143 total,   6 running, 135 sleeping,   0 stopped,   2 zombie
Cpu(s):  5.0%us, 92.4%sy,  2.3%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   2066016k total,   942312k used,  1123704k free,    17472k buffers
Swap:  1048572k total,        0k used,  1048572k free,   330456k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  216 root      20   0     0    0    0 R 46.5  0.0   2:53.45 kworker/0:1
 1002 root      20   0     0    0    0 R 46.5  0.0   2:53.61 kworker/0:2
 5286 wwwrun    20   0 66936  63m 7708 S  2.0  3.1   0:07.20 webadmin.plx
 5600 root      20   0 46984  31m 1996 S  1.3  1.6   0:03.81 confd.plx
 5959 root      39  19 39592  32m 4096 R  0.7  1.6   0:01.39 gen_inline_repo
   10 root      20   0     0    0    0 R  0.3  0.0   0:00.66 rcu_sched
 2622 root      20   0  7068 3184 1688 S  0.3  0.2   0:01.41 syslog-ng
 3529 postgres  20   0  561m 2328 1328 S  0.3  0.1   0:00.04 postgres
 4400 root      20   0 31272  22m 2644 S  0.3  1.1   0:00.46 smtpd.bin
 5272 wwwrun    20   0 11144 4576 2672 S  0.3  0.2   0:00.17 httpd
 5398 snort     19  -1  471m 214m 2400 S  0.3 10.6   0:16.06 snort_inline
 5712 root      39  19  8968 7216 2288 S  0.3  0.3   0:00.27 create_rrd_grap
 5731 postgres  20   0  562m 5748 4472 S  0.3  0.3   0:00.30 postgres
    1 root      20   0  1912  592  516 S  0.0  0.0   0:00.33 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      20   0     0    0    0 S  0.0  0.0   0:00.06 ksoftirqd/0
    5 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kworker/u:0
    7 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kworker/u:0H
    8 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    9 root      20   0     0    0    0 S  0.0  0.0   0:00.02 rcu_bh
   11 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 khelper
   12 root      20   0     0    0    0 S  0.0  0.0   0:00.01 kworker/u:1
  108 root      20   0     0    0    0 S  0.0  0.0   0:00.00 bdi-default
  110 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 kblockd
  205 root      20   0     0    0    0 S  0.0  0.0   0:00.00 khubd
  321 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kswapd0
  383 root      20   0     0    0    0 S  0.0  0.0   0:00.00 fsnotify_mark
  399 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 crypto
  988 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 edac-poller
  997 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 deferwq
 1054 root       0 -20     0    0    0 S  0.0  0.0   0:00.00 ata_sff

EDIT: I ran the "/etc/init.d/postgresql92 rebuild" command, and rebooted, and the problem seems fixed. I think I'll wait a while before upgrading my home router.

0 RChadwick over 13 years ago

Well, I thought all was OK. A few hours later, I noticed a security camera wasn't accessible. I logged in to the utm, and the CPU is again at 100%. I'm just going to revert back to the previous version.
Cancel
Vote Up 0 Vote Down

Cancel
0 jeff.welling over 13 years ago

Kworker, what is it and why is it hogging so much CPU? - Ask Ubuntu
Very interesting re kworker

rChadwick: If you have Support, please file a ticket with them at your convenience, I'm pretty darn sure that's not supposed to be using that much CPU.
Cancel
Vote Up 0 Vote Down

Cancel

0 alainp_01 over 13 years ago in reply to jeff.welling

Hi,

I've the same problem here (Astaro Virtual Appliance / VMware 5.1 / trial licence ), after upgrade to 9.1 GA.

I tried the rebuild of PostgreSQL, but CPU is still @ 100 %

astaro:/var/log # /etc/init.d/postgresql rebuild
-bash: /etc/init.d/postgresql: No such file or directory
astaro:/var/log # /etc/init.d/postgresql92 rebuild
Rebuilding PostgreSQL database, all reporting data will be lost!
Enter "yes" to continue...
yes
:: Stopping PostgreSQL                                                                         done
:: Initializing the PostgreSQL database                                                        done
:: Upgrading PostgreSQLpsql: FATAL:  "base/11564" is not a valid data directory
DETAIL:  File "base/11564/PG_VERSION" does not contain valid data.
HINT:  You might need to initdb.
                                                                                               done
:: Starting PostgreSQL                                                                         done
:: Restarting SMTP Proxy
:: Stopping SMTP Proxy
[ ok ]
:: Starting SMTP Proxy
[ ok ]
[ ok ]

The /var/log/system grow very fast.

Here is the tail of it :


2013:05:24-14:36:56 astaro syslog-ng[12793]: POLLERR occurred while idle; fd='67'
2013:05:24-14:36:56 astaro postgres[13778]: [3-1] FATAL:  role "reporting" does not exist
2013:05:24-14:36:56 astaro syslog-ng[12793]: POLLERR occurred while idle; fd='62'
2013:05:24-14:36:56 astaro postgres[13780]: [3-1] FATAL:  role "reporting" does not exist
2013:05:24-14:36:56 astaro syslog-ng[12793]: POLLERR occurred while idle; fd='39'
2013:05:24-14:36:56 astaro postgres[13782]: [3-1] FATAL:  role "reporting" does not exist
2013:05:24-14:36:56 astaro syslog-ng[12793]: POLLERR occurred while idle; fd='67'
2013:05:24-14:36:56 astaro postgres[13784]: [3-1] FATAL:  role "reporting" does not exist
2013:05:24-14:36:56 astaro syslog-ng[12793]: POLLERR occurred while idle; fd='62'

Any idea ?

Regards,

Alain Parmentier

0 alainp_01 over 13 years ago in reply to alainp_01

See "[UPDATED] Reported issues in 9.1 GA release"

https://community.sophos.com/products/unified-threat-management/astaroorg/f/52/t/28917

25868: performance regression in ins_accounting(): Postgres running on 100%
>> Will be included in 9.101
Cancel
Vote Up 0 Vote Down

Cancel
0 jeff.welling over 13 years ago

You might want to re-image to 9.006 to prevent that from happening until 9.1 is fixed if logging and reporting are important functions for your network, I suspect the entire reporting system will have failed as a result of that message. It's interesting that it appears to be a database error and that the rebuild didn't correct the problem.
Cancel
Vote Up 0 Vote Down

Cancel
0 alainp_01 over 13 years ago in reply to jeff.welling

Logging and Reporting is not the issue for me.

100% CPU is more ennoying, because I'm running a Virtual Machine consuming CPU and slowing the other VM.

I've restricted , under VMware, the maximum CPU allocation for UTM.
Cancel
Vote Up 0 Vote Down

Cancel
0 JamesGolden over 13 years ago

Same problem since last night's upgrade from 9.005-5 to 9.100-16. Watching TOP, syslog-ng is using more than 50-64% of the CPU. system.log shows continuous "POLLERR occurred while idle" messages. Really strange is that dashboard shows all but HA/Cluster intf as down, even though we're still getting traffic in and out. These are HA linked ASG320 (hardware type 320C) appliance. Var/log (/dev/sda7) partition has 63GB free. Not sure what to check next. I'm worried that I may need to rebuild to previous version.
Cancel
Vote Up 0 Vote Down

Cancel
0 kurtleroy over 13 years ago

Same issue after upgrade. May run hours, or couple of days, but ultimately CPU will spike to 99 - 100 % and have to physically go to unit to get it to restart. There is no clear cut answer to the spike. There is no spike in traffic, users, vpn clients, nothing that would immediately stand out. Is getting extremely difficult to manage, considering we are running 24/7 and everything comes through the UTM 320s running in HA. Getting out of bed, driving 20 miles, pushing a couple of buttons and driving back is getting old quick.

This is what it looks like everytime, and theres no way to get any information on what going on because with the cpu maxed we cant get any reports or logs to come up...
Thanks
Cancel
Vote Up 0 Vote Down

Cancel