After one year of operation with ups and downs mainly regarding firmware versions 2019 started with a hardware issue. The primary appliance stopped working due to a dying harddisk. Unfortunately this was detected only because internet and email stopped working. XG UI then was not available so I initiated a reboot through console which was still available. As expected the auxiliary appliance took over but the former primary appliance did not boot completely. So I connected a display and keyboard to see that there where hundreds of errors during boot.
I would have expected the auxiliary device to take over much earlier and not through manual reboot but this might be something Sophos could work on in the future.
I raised a ticket at our reseller who forwarded the issue to Sophos. The same day a spare appliance was shipped and arrived two days later.
Although I had to report the firmware installed on the device the spare came with an older firmware version. So the first step after initial setup was to install the latest firmware 17.5 GA which was also installed on our two HA devices. Unfortunately we still had the first version of 17.5 GA installed whereas we only could download the later version of 17.5 GA for the spare.
The second problem we had was that we disconnected HA on the former auxiliary device as we thought we had to. Unfortunately the licences where bound to the appliance which stopped working so the active device lost all the licences and at the same time spam came through to our mailboxes unfiltered.
So what to do? The running device had no licences and an older firmware and HA was not available because of the different firmwares. We decided to install the latest firmware also on the active device which worked without problems. Then we set up HA which also worked. But still the licences were missing. After searching online I found the way to transfer the licences to the spare device but still they did not become active. The issue here was that we had to switch operation to the spare device which now had the licences.
Finally we managed to get everything back to work as expected but two things remain:
- The auxiliary device should taken over operation earlier and automatically
- Breaking HA with the device beeing primary which not has the licences bound to it should not immediatly end up in an unprotected situation
This thread was automatically locked due to age.