This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

[7.400] 100% CPU and postgres: reporting reporting [local] SELECT

Hello,

I have problem with CPU LOAD on ASG320 with 7.400 version.

My Process List:


USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0    720   280 ?        S    Feb27   0:01 init [3]  
root         2  0.0  0.0      0     0 ?        RN   Feb27   0:00 [ksoftirqd/0]
root         3  0.0  0.0      0     0 ?        S
root     22039  0.0  1.4  30888 15392 ?        R    15:23   0:00      \_ confd [worker[:P]rpc]
root      2902  0.0  1.0  13692 10652 ?        Ss   Feb27   6:09 dns-resolver.plx
root      2936  0.0  0.5   8172  5244 ?        S    Feb27   0:14 /usr/local/bin/sysmond
root      3012  0.0  0.9  15816 10204 ?        Ss   Feb27   0:01 /var/aua/aua.bin
root     21703  0.1  0.0      0     0 ?        Z    15:23   0:00  \_ [aua.bin] 
root      3018  0.0  1.2  19364 13248 ?        S    Feb27   5:32 /usr/local/bin/notifier.plx
root      3054  0.0  0.0   1804   712 ?        Ss   Feb27   0:00 /usr/sbin/cron
root     20354  0.0  0.0   1828   560 ?        S    15:17   0:00  \_ /usr/sbin/cron
root     20355  0.0  1.2  16308 12680 ?        SNs  15:17   0:00      \_ /usr/local/bin/gen_inline_reporting_data.plx
root      3119  0.0  0.0   1760   448 ?        Ss   Feb27   0:00 /usr/local/bin/asg_ha_zeroconf
postgres  3144  0.0  0.5  48812  5348 ?        S    Feb27   0:04 /usr/bin/postgres -D /var/storage/pgsql/data
postgres  3150  0.0  3.3  48948 34944 ?        Ss   Feb27   0:03  \_ postgres: writer process                    
postgres  3151  0.0  0.1  48812  1040 ?        Ss   Feb27   0:03  \_ postgres: wal writer process                
postgres  3152  0.0  0.1  49108  1344 ?        Ss   Feb27   0:00  \_ postgres: autovacuum launcher process       
postgres  3153  0.0  0.1   6964  1068 ?        Ss   Feb27   0:08  \_ postgres: stats collector process           
postgres  3997  0.1  3.6  49636 37036 ?        Ss   Feb27  10:57  \_ postgres: reporting reporting [local] idle  
postgres  4282  0.0  3.2  50040 33448 ?        Ss   Feb27   0:07  \_ postgres: postgres smtp 127.0.0.1(38821) idle
postgres 20358 38.2  5.4  88708 56136 ?        Rs   15:17   2:38  \_ postgres: reporting reporting [local] SELECT
postgres 21515  0.1  0.4  50024  4640 ?        Ss   15:21   0:00  \_ postgres: postgres smtp 127.0.0.1(51058) idle
root      3195  0.0  6.8  73104 69940 ?        S    Feb27   0:30 /var/mdw/mdw_daemon.plx
root      3220  0.1  0.8  14152  8856 ?        S    Feb27   7:21 /usr/local/bin/selfmonng.plx
root      3255  0.0  0.6  13552  6340 ?        S    Feb27   0:00  \_ [timewarp check]
root     21133  0.0  0.1   2520  1108 ?        S    Mar01   0:00  \_ /bin/bash /usr/bin/ctasd_connect_check.sh
root     21136  0.0  0.1   3332  1232 ?        S    Mar01   0:00      \_ /tmp/ctasd -q --timeout=5 --tries=1 http://resolver1.ast.ct
root      3227  0.0  0.0   1500   512 ?        Ss   Feb27   0:00 /usr/local/bin/daemon-watcher selfmonng.plx /usr/local/bin/selfmonn
root      3228  0.0  0.0   1708   676 tty1     Ss+  Feb27   0:00 /sbin/mingetty --noclear --no-hostname tty1
root      3229  0.0  0.0   1708   656 tty2     Ss+  Feb27   0:00 /sbin/mingetty --no-hostname tty2
root      3230  0.0  0.0   1708   660 tty3     Ss+  Feb27   0:00 /sbin/mingetty --no-hostname tty3
root      3231  0.0  0.0   1712   660 tty4     Ss+  Feb27   0:00 /sbin/mingetty --no-hostname tty4
root      3256  0.4  0.2   3904  2272 ?        S

where you can find proces use about 40% CPU
"postgres 20358 38.2 5.4 88708 56136 ? Rs 15:17 2:38 \_ postgres: reporting reporting [local] SELECT"

Do you have more CPU load like me, too?

Regards,
WaMaR

This thread was automatically locked due to age.

0 BAlfson over 16 years ago

Depending on the size of your log files, Astaro has said that PostgreSQL might take many many hours to complete reorganizing after the upgrade from 7.306 to 7.400.

Might that be your problem?

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 WaMaR over 16 years ago in reply to BAlfson

My ASG320 uptime with version 7.400 is 5d 6h 19m. I think that another problem.
Cancel
Vote Up 0 Vote Down

Cancel
0 William Warren over 16 years ago in reply to WaMaR

My ASG320 uptime with version 7.400 is 5d 6h 19m. I think that another problem.

how is this a problem?

Owner: Emmanuel Technology Consulting

http://etc-md.com

Former Sophos SG(Astaro) advocate/researcher/Silver Partner

PfSense w/Suricata, ntopng,

Other addons to follow
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 16 years ago in reply to William Warren

If you haven't upgraded the RAM in your ASG320, that might help.

It looks like you caught PostgreSQL at a busy time. The busy process started at 15:17, and if you looked at the process list soon after that, then it would show a high percentage of the usage. It seems strange to me that it pops up and pegs your CPU usage almost every 15 minutes. Since the 100% peak at 2:30 is so narrow, it doesn't look like you have PostgreSQL files that are very large.

The next time you see the Dashboard meter going up to 100%, login to ssh, run top and touch M (not "m") to see what process is at the top of the list.

But, I'd have to agree with William that no user complaints and no downtime probably means that the situation is not wrong, just different.

Still, I am curious to understand the regular spikes and, since you've obviously had this unit for awhile, why the rollover at 2:30 happens so quickly if you have large files.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 16 years ago in reply to William Warren

Have you upgraded the RAM in your ASG?

I'd agree with William that no downtime and no user complaints is an indication that this isn't a problem, just different.

Still, I have two questions that maybe someone could answer. Why does the unit hit 100% every 15 minutes most of the day? And, since WaMaR has been around for quite awhile, doesn't it stand to reason that his files are large enough for the peak at 2:30 to be much wider?

The next time you see the ASG reaching 100% on the Dashboard, log in with ssh, run top and touch M (not "m") to see the processes using the most CPU. Please let us know if it's consistently postgres.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 WaMaR over 16 years ago in reply to BAlfson

I still have 1GB RAM, but my RAM/Swap Usage is low.

My CPU load is lower today, but when will be worse I run top command from shell.
Cancel
Vote Up 0 Vote Down

Cancel
0 StephanG over 16 years ago in reply to WaMaR

Same problem here.

Have updated about 2-3 weeks before and box says:

postgres 20866 64.0 2.4 84996 51800 ? Rs 20:10 0:03 \_ postgres: reporting reporting [local] SELECT

I have an ASG425 with 2 GB RAM.

So whats the issue ?

Greets
Stephan
Cancel
Vote Up 0 Vote Down

Cancel
0 BAlfson over 16 years ago in reply to StephanG

Stephan, I don't recognize that line as one from top - what is it from?

Since the earlier exchange, a solution to the problem was found. Search here for PostgreSQL. I think it has to do with reindexing, but it might have required deleting then restoring.

Cheers - Bob

Sophos UTM Community Moderator
Sophos Certified Architect - UTM
Sophos Certified Engineer - XG
Gold Solution Partner since 2005

MediaSoft, Inc. USA
Cancel
Vote Up 0 Vote Down

Cancel
0 andreas over 16 years ago in reply to BAlfson

Instead of purging everything, I'd suggest you start by reducing the database keep times in the WebAdmin Reporting::Settings. If you set hold times to a short timespan, the databases will stay small and the overall load will decrease. Please be advised that the cleanup process only runs twice a day and will therefore not be effective immediately.

Cheers,
Andreas
Cancel
Vote Up 0 Vote Down

Cancel

0 drees over 16 years ago in reply to andreas

I'm also affected by this bug (see https://community.sophos.com/products/unified-threat-management/astaroorg/f/51/t/20453)

So far I've reduced reporting from 6 months to 3 months and it's still spiking.

First I'm going to do some tweaking of the postgresql.conf (by editing /var/storage/pgsql/data/postgresql.conf.fixed) and see if I can turn up logging to help identify the slow query.

Edit: Done. Looks like a dozen or so queries against the accounting_data table taht take 25-30 seconds to run each. They also seem to generate quite large temporary files (150MB or so).

Here's the first query out of the dozen:

SELECT resolve_service(ip_protocol || '/' || l4_dport) as service_str,resolve_protocol(ip_protocol) as

protocol_str,ip_protocol || '/' || l4_dport as service,transform_SI(CAST(sum(raw_in_pktlen) AS BIGINT),1) as

sum_tra_in_str,sum(raw_in_pktlen) as sum_tra_in_int,transform_SI(CAST(sum(raw_out_pktlen) AS BIGINT),1) as

sum_tra_out_str,sum(raw_out_pktlen) as sum_tra_out_int,transform_SI(CAST(sum(raw_in_pktlen+raw_out_pktlen) AS BIGINT),1) as

sum_tra_total_str,sum(raw_in_pktlen+raw_out_pktlen) as sum_tra_total_int,transform_common(CAST(sum(flow_count) AS BIGINT)) as

cnt_flow_str,sum(flow_count) as cnt_flow_int,transform_common(CAST(sum(raw_in_pktcount+raw_out_pktcount) AS BIGINT)) as

sum_pktcount_str,sum(raw_in_pktcount+raw_out_pktcount) as sum_pktcount_int,l4_dport as port FROM accounting_data WHERE dayabs =

733532 GROUP BY service,ip_protocol,port ORDER BY sum_tra_total_int DESC,sum_tra_out_int DESC,sum_tra_total_int

Next step is to analyze the query and see if any tuning parameters help.