We have installed ASL V4.016 with a HA-license on a HA-cluster made of two HP/Compaq DL320. In addtion to the two built-in 10/100/1000 NICs, we added a dual-port NIC HP/Compaq NC3134 to each server's (one and only) PCI-slot to end up with 4 NICs. According to the documentation (hardware compatibility list), these NC3134 support heartbeat. So the two servers are connected over one of those ports using a crossover Ethernet cable for the HA-connection.
The installation went well and the HA-cluster came up and worked fine. Up to the point where we issued a shutdown command in WebAdmin. The (active) master server was shutting down as requested and the slave became the active server. Then we also issued a shutdown to the slave (the new ACPI functions in V4.016 now manage to power off the DL320s automatically).
Now here is the problem: After power on of the master it becomes the active server. Then we power on the slave. It comes up and beeps twice. In WebAdmin we can see that the HA-cluster is active and there is connectivity between master and slave (as expected). However, a few moments later (2 to 5 minutes), the master beeps once and unexpectedly shuts down / power off, without us touching the cluster. The slave takes over as active.
When powering the master back on, it comes up and remains inactive. But a few moments later the active slave shuts down and powers itself off. Now the master takes over and becomes active.
This power-on/-off continues to alternate likes this between the two machine. We can not get the cluster up and running anymore, e.g. have both servers powered on at the same time.
We checked the kernel logs and astaro.org, but did not spot anything particular that might hint where to look further. Any ideas what we should try next? Thank you.
Regards,
Rolf
This thread was automatically locked due to age.