This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XG on SG210rev1 hardware

Hi,

Since upgrading our SG210rev1 cluster to XG 17.1.1 we have seen significant issues. HA failover was taking a long time (even after a failing disk was identified and the bad device was swapped out) and in the end I have disabled HA and am just running on a single device to try and get some stability.

I am seeing the following:

  1. HA failover takes 20+ minutes to complete. If I interrupt it, the aux node goes into failsafe mode
  2. Even without HA, it is still 15 minutes or so before the device is available for use when rebooted. I suspect that this is causing the HA issue.
  3. One reboot for unknown reasons
  4. One freeze where I needed to come in and power cycle the device (rcu_sched self-detected stall on CPU { 1})
  5. System load seems high. The performance icon on the graph is mostly orange, although is green at the moment. I don't believe the device is overloaded or anything, under SG the load was always low, and I don't have any IPS rules in place.

I have done quite a few SG->XG migrations, but these are all on rev2 hardware, so i'm wondering if there is an issue with rev1 hardware running XG? When I RA's the device with the failed disk I was sent a rev2 device by mistake, and that didn't seem to have any issues although I barely tested it (obviously it failed to form a cluster with the rev1 device so it never saw production use)

Another possibility is that I have something in my configuration that is triggering slow boot time and generally high load. I can't think what though, and I have done a complete wipe and restore of a backup in case there was some corruption being transmitted from node to node via HA failover.

Thanks

James



This thread was automatically locked due to age.