This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XG on SG210rev1 hardware

Hi,

Since upgrading our SG210rev1 cluster to XG 17.1.1 we have seen significant issues. HA failover was taking a long time (even after a failing disk was identified and the bad device was swapped out) and in the end I have disabled HA and am just running on a single device to try and get some stability.

I am seeing the following:

  1. HA failover takes 20+ minutes to complete. If I interrupt it, the aux node goes into failsafe mode
  2. Even without HA, it is still 15 minutes or so before the device is available for use when rebooted. I suspect that this is causing the HA issue.
  3. One reboot for unknown reasons
  4. One freeze where I needed to come in and power cycle the device (rcu_sched self-detected stall on CPU { 1})
  5. System load seems high. The performance icon on the graph is mostly orange, although is green at the moment. I don't believe the device is overloaded or anything, under SG the load was always low, and I don't have any IPS rules in place.

I have done quite a few SG->XG migrations, but these are all on rev2 hardware, so i'm wondering if there is an issue with rev1 hardware running XG? When I RA's the device with the failed disk I was sent a rev2 device by mistake, and that didn't seem to have any issues although I barely tested it (obviously it failed to form a cluster with the rev1 device so it never saw production use)

Another possibility is that I have something in my configuration that is triggering slow boot time and generally high load. I can't think what though, and I have done a complete wipe and restore of a backup in case there was some corruption being transmitted from node to node via HA failover.

Thanks

James



This thread was automatically locked due to age.
Parents Reply Children
No Data