We installed two ASG425 in an Active/Passive Configuration about 11 Days ago. At the moment, we just have Firewall, HTTP/FTP Proxy, Antivirus, AntiSpyware and HA systems running. We have v7.502 installed.
The CPU load was never really high, we had a average Load of about 10%. That changed today, now the CPU load often peaks to 100% for several minutes, resulting in an average load of over 70% within the past few hours. After reading several posts, I saw that there have been some postgres Porblems. It seems like the high load always occurs, if postgres is working. A ps aux | grep postgres shows:
postgres 3571 0.0 0.2 48820 5340 ? S Jan20 0:08 /usr/bin/postgres -D /var/storage/pgsql/data
postgres 3637 0.0 1.6 49036 34996 ? Ss Jan20 0:41 postgres: writer process
postgres 3638 0.0 0.0 48820 1136 ? Ss Jan20 0:07 postgres: wal writer process
postgres 3639 0.0 0.0 49148 1400 ? Ss Jan20 0:01 postgres: autovacuum launcher process
postgres 3640 0.0 0.0 6972 1048 ? Ss Jan20 0:47 postgres: stats collector process
postgres 3880 0.0 0.1 6084 3556 ? Ss Jan20 0:03 slon_control
postgres 7198 0.0 1.8 50332 37848 ? Ss Jan20 6:24 postgres: ha_sync pop3 198.19.250.2(49612) idle
postgres 7362 0.0 0.6 50128 13352 ? Ss Jan20 0:00 postgres: postgres smtp 198.19.250.2(49621) idle
postgres 7379 0.0 0.9 50320 19024 ? Ss Jan20 0:00 postgres: postgres smtp 127.0.0.1(44289) idle
postgres 13437 0.0 0.0 2036 696 ? S Jan21 0:00 slon asg_cluster dbname=reporting user=ha_sync
postgres 13438 0.0 0.0 2040 700 ? S Jan21 0:00 slon asg_cluster dbname=pop3 user=ha_sync
postgres 13439 0.0 1.8 96476 37540 ? Sl Jan21 0:51 slon asg_cluster dbname=pop3 user=ha_sync
postgres 13440 0.0 1.9 99092 40380 ? Sl Jan21 2:04 slon asg_cluster dbname=reporting user=ha_sync
postgres 13450 0.0 1.8 50208 37236 ? Ss Jan21 1:21 postgres: ha_sync reporting [local] idle
postgres 13451 0.0 1.7 50208 36272 ? Ss Jan21 0:02 postgres: ha_sync pop3 [local] idle
postgres 13458 0.0 1.9 52560 40208 ? Ss Jan21 1:52 postgres: ha_sync reporting [local] idle
postgres 13459 0.0 1.9 52448 40416 ? Ss Jan21 14:01 postgres: ha_sync reporting [local] idle
postgres 13460 0.0 1.8 50636 38700 ? Ss Jan21 4:39 postgres: ha_sync reporting [local] idle
postgres 13465 0.0 1.9 52500 40044 ? Ss Jan21 1:14 postgres: ha_sync pop3 [local] idle
postgres 13468 0.0 1.9 51936 39668 ? Ss Jan21 1:33 postgres: ha_sync pop3 [local] idle
postgres 13469 0.0 1.8 50628 38156 ? Ss Jan21 0:07 postgres: ha_sync pop3 [local] idle
postgres 13482 0.0 1.8 50188 37288 ? Ss Jan21 2:22 postgres: ha_sync pop3 198.19.250.2(38350) idle
postgres 13483 0.3 2.0 55804 42452 ? Ds Jan21 54:33 postgres: ha_sync reporting 198.19.250.2(38351) FETCH
postgres 13485 0.9 1.9 51616 39696 ? Ss Jan21 165:21 postgres: reporting reporting [local] idle
postgres 16150 0.0 1.8 50240 37560 ? Ss Jan23 10:19 postgres: ha_sync reporting 198.19.250.2(47747) idle
postgres 9952 0.0 0.9 50032 18680 ? Ss 00:15 0:31 postgres: postgres smtp 198.19.250.2(40711) idle
postgres 27381 0.0 0.7 50032 15068 ? Ss 14:21 0:01 postgres: postgres smtp 127.0.0.1(42425) idle
postgres 11410 0.0 0.3 52016 7204 ? Ss 14:54 0:00 postgres: reporting reporting [local] idle
postgres 11441 0.0 0.2 50116 4608 ? Ss 14:54 0:00 postgres: reporting reporting [local] idle
postgres 24225 20.2 4.9 125704 102444 ? Ds 15:17 2:25 postgres: reporting reporting [local] SELECT
postgres 31679 0.1 0.6 181760 14328 ? Ds 15:24 0:00 postgres: autovacuum worker process reporting
postgres 2112 0.0 0.3 165428 7356 ? Ss 15:26 0:00 postgres: autovacuum worker process reporting
postgres 4800 0.0 0.1 49676 3792 ? Ds 15:29 0:00 postgres: autovacuum worker process pop3
which seems to be quite a lot processes to me. In addition, top shows that there is quite much wa, Waiting for IO to complete while postgres is using quite much CPU. This results in 0% CPU Idle.
top - 15:23:06 up 11 days, 15:30, 1 user, load average: 11.15, 8.85, 6.91
Tasks: 144 total, 1 running, 141 sleeping, 0 stopped, 2 zombie
Cpu0 : 35.4%us, 5.1%sy, 0.0%ni, 0.0%id, 51.5%wa, 2.0%hi, 6.1%si, 0.0%st
Cpu1 : 47.5%us, 12.1%sy, 0.0%ni, 0.0%id, 32.3%wa, 2.0%hi, 6.1%si, 0.0%st
Mem: 2067084k total, 2012572k used, 54512k free, 6164k buffers
Swap: 1052248k total, 132k used, 1052116k free, 1135420k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
13483 postgres 16 0 55976 40m 35m D 47 2.0 53:53.52 postgres
2205 chroot 16 0 332m 109m 7588 S 30 5.4 95:47.67 httpproxy
13485 postgres 16 0 51616 38m 36m S 10 1.9 165:07.92 postgres
7277 root 14 -1 27736 25m 708 S 7 1.3 126:07.40 ulogd
24225 postgres 18 0 122m 72m 8448 D 5 3.6 1:07.88 postgres
7790 root 14 -1 3720 2736 472 S 4 0.1 58:56.98 ctsyncd
13459 postgres 16 0 52448 39m 36m D 3 2.0 14:00.68 postgres
12112 postgres 15 0 177m 158m 35m D 3 7.9 2:45.18 postgres
9188 root 15 0 14140 11m 2732 S 2 0.6 12:31.78 websec-reporter
5174 root 15 0 2936 1984 684 S 1 0.1 12:34.63 syslog-ng
30795 root 15 0 13908 8404 7032 S 1 0.4 6:11.78 winbindd
1 root 16 0 720 280 244 S 0 0.0 0:01.07 init
2 root RT 0 0 0 0 S 0 0.0 0:00.70 migration/0
[...]
Is this some kind of Bug, or is there a known workaround for this problem? I really do not like this behaivure, I can't really imagine why that this problem just occured out of nowhere. We have not changed anything within the last days. Any help would be very appreciated.
Thanks
This thread was automatically locked due to age.