[BUG][8.001] NMAP DOS of Astaro

Let me explain our setup really quick before I launch into what is going on. We have Astaro, running on VMWare, serving as the gateway for a very small (~8 machines) internal network and very, very low network traffic (almost none). The few of us outside the network that need access into the network use a very simple PPTP setup where we login and get an internal IP. All pretty standard.
Anyways, up until yesterday, we were running the Astaro v8 beta software, but we decided to upgrade all the way up to 8.001. Today, I needed to do a port scan on the internal network while I was outside of it, so I fired up the VPN, and ran the following nmap:

> nmap -sS -p1-65535 192.168.0.0/24

To my surprise, about halfway through my NMAP I got a notice that my connection to the VPN had been dropped.  I assumed that either I had triggered some security feature or this had been a coincidence, so I asked a coworker to attempt to VPN to the machine; no connection.  About this time Nagios (sitting outside the VPN) started freaking out about all of the machines inside the VPN (NRPE - using simple port forwarding).  I went to vSphere and hit Astaro's console only to find that it wouldn't respond to any key presses at all (it was at the Astaro splash screen).  I finally had to reboot it in vSphere.
I believed, at the time, that the problem was just coincidental to my scan, after all we had done the same exact scan many times in the past, but a second attempt caused the same behavior.  On a third attempt, I tail -f'd all of my log files and, other than a rapidly exploding packet filter log, nothing unusual was reported up to the very moment that everything just froze up.
I used vSphere's Performance tab and noticed a few things.  First, around the time my nmap started, the CPU usage on Astaro jumped to 100% and stayed there even after the machine froze.  Now, the server that this is on has two 2ghz quadcore CPUs and Astaro has full access to all of those resources as it needs it, so I doubt the box's performance is an issue.  Disk space is not large at all (~62% of the allocated space).
I'm not ready to rule out VMWare as the culprit here, but I do want to remind anyone reading this that we never had this problem with the v8 beta versions even when we were running much higher network traffic.

Parents

0 Enekk over 15 years ago

I ran the test with TOP sorting by CPU use and this is what I found:
1. ulogd began to eat up more and more CPU until things finally froze (it ended with ~40%, but it jumped higher at times). This is especially a problem as ulogd runs at a nice of -1...
2. postgres would pop in every so often and consume quite a bit of CPU.
3. pfilter-reporte was also high up there, keeping around 15% of the CPU until ulogd grew too large and took up its share of the CPU as well (postgres is a nice of 0 so ulogd was able to eat its CPU allocation).
Next up, attempting to crash things from the internal network.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 RFCat_vk_01 over 15 years ago in reply to Enekk

Hi,
have a look at your packet filter rule to se if logging is enabled on it.

Ian M
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 RFCat_vk_01 over 15 years ago in reply to Enekk

Hi,
have a look at your packet filter rule to se if logging is enabled on it.

Ian M
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

0 Enekk over 15 years ago in reply to RFCat_vk_01

Hi,
have a look at your packet filter rule to se if logging is enabled on it.

Ian M
I've looked through all of our rules (very few of them) and have not seen any that have logging enabled.  I also went to the advanced tab of the packet filter stuff and made sure all of the logging options were disabled there.

As to the "Can I crash things on the internal network" question.  The answer is not with only one machine as I can from the external network.  The ulogd process keeps coming up and pulling a ton of CPU, but the total usage jumps all over the place (at one point 90% - oddly this was postgres going nuts), but ulogd keeps releasing its CPU allocation.

The machine outside the firewall is running nmap 5 while the one inside is running 5.21 and it seems like they scan a bit differently.  5 seems to cast a much wider net while scanning (i.e. it looks at a larger IP space at a time) while 5.21 seems more conservative in the number of IPs it scans at a time.

I'm at a loss, I'd be willing to be that doing this on one or two more machines internally (i.e. moving to a DDOS) would bork things, but perhaps it has more to do with the combination of traffic coming over the VPN and routing rules.  Really wish we had the money for a support contract so we could look into this more.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Billybob over 15 years ago in reply to RFCat_vk_01

Also forgot to mention the VPN Remote Access Reporting which supposedly uses more cycles. But that has been available in earlier betas so wouldn't affect just this version in particular.

If this thing crashes from the internal LAN, there is something definitely wrong. Do you have huge I/O wait times.

@Ian, forgot about the per rule logging option but he has been running the same setup in earlier betas I assume.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Enekk over 15 years ago in reply to Billybob

Just wanted to add that work hours are rapidly dwindling here and I will have to leave, but I'll be back on this in the morning if anyone has any other tests/ideas.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Billybob over 15 years ago in reply to Enekk

I am at a loss at this point but here is what I would look for:

I/O waiting time, if it is large enough and your hard drive is spinning wildly, then astaro is to blame. Otherwise its not playing nice with the virtual environment.

Best of luck and sorry about the rant earlier, you weren't supposed to see it[;)]
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Enekk over 15 years ago in reply to Billybob

I am at a loss at this point but here is what I would look for:

I/O waiting time, if it is large enough and your hard drive is spinning wildly, then astaro is to blame. Otherwise its not playing nice with the virtual environment.
I did notice spikes in the I/O access speeds that might correlate to the crashes. I'll test more tomorrow, but I'm sure we all know the old maxim about correlation and causation.

Best of luck and sorry about the rant earlier, you weren't supposed to see it[;)]
No biggie, thanks for your help so far.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel