I have been having a problem with my HA system since the upgrade to 7.x. After X hours (it used to be 4 now it is up to 12). I get 2008:09:10-01:50:45 (none) ha_daemon[3046]: id="38A1" severity="warn" sys="System" sub="ha" name="Current load average 11.08 of node 2 is to high, please check you system!"
and the system stops responding.
This is a software and not an appliance. I have only Intel cards in the box and this box worked great with version 6.x. Also, if I put this box into production it will hang after a similiar number of hours. However, another box with 3com cards and the same config works fine.
Astaro believes it may have something to do with logging but we still can not find a solution. I was hoping 7.301 and the new database would solve it.
My current thoughts are.
Replace Network cables
change port in switch
change NICS
Anyone have any other thoughts. Bruce, I am hoping you have seen this.
Thanks,
Keith
2008:09:10-01:50:45 (none) ha_daemon[3046]: id="38A1" severity="warn" sys="System" sub="ha" name="Current load average 11.08 of node 2 is to high, please check you system!"
2008:09:10-02:54:48 (none) ha_daemon[3046]: id="38A1" severity="warn" sys="System" sub="ha" name="Current load average 35.97 of node 2 is to high, please check you system!"
2008:09:10-04:30:24 (none) ha_daemon[3046]: id="38A1" severity="warn" sys="System" sub="ha" name="Current load average 37.38 of node 2 is to high, please check you system!"
2008:09:10-04:32:34 (none) slon[6424]: [12-1] ERROR remoteListenThread_2: timeout (300 s) for event selection
2008:09:10-05:32:25 (none) ha_daemon[3046]: id="38A1" severity="warn" sys="System" sub="ha" name="Current load average 46.92 of node 2 is to high, please check you system!"
2008:09:10-06:34:26 (none) ha_daemon[3046]: id="38A1" severity="warn" sys="System" sub="ha" name="Current load average 25.01 of node 2 is to high, please check you system!"
2008:09:10-07:42:59 (none) slon[6400]: [16-1] ERROR remoteListenThread_2: timeout (300 s) for event selection
This thread was automatically locked due to age.