This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

All RED Devices unable to connect

Last night at 22:58:16 ET all of our RED devices began disconnecting and are unable to reconnect.

We've made no recent configuration changes to our network or Sophos UTMs.

We have 37 REDs of various models, many of which have been in use for several years.

This is a summary of the logged events from the initial disconnection to the current looping errors:

2022:07:28-22:58:16 gateway-2 red_server[34459]: R20001GHKVGDX50: command '{"data":{"message":"Failed to send keepalive frame: Trying to send PING but expecting PONG to receive first","type":"RUNTIME_ERROR_OCCURRED"},"type":"DISCONNECT"}'
2022:07:28-22:58:16 gateway-2 red_server[34459]: R20001GHKVGDX50: Disconnecting: RUNTIME_ERROR_OCCURRED, Failed to send keepalive frame: Trying to send PING but expecting PONG to receive first


2022:07:29-08:22:09 gateway-2 red_server[43473]: SELF: New connection from XX.XX.43.9 with ID R20001GHKVGDX50 (cipher AES256-GCM-SHA384), rev1
2022:07:29-08:22:09 gateway-2 red_server[43473]: SELF: no such client: R20001GHKVGDX50
2022:07:29-08:22:09 gateway-2 red_server[43473]: R20001GHKVGDX50: Sending json message {"data":{},"type":"DEVICE_NOT_BOUND_TO_UTM"}

Failed attempts to resolve:

  • Toggling the devices enabled status
  • Enabling and disabling tunnel compression
  • Changing UTM hostname
  • Physically restarting RED devices

We have a critical case open with Sophos, but it has been over an hour with no contact from them.



This thread was automatically locked due to age.
  • I've also tried: killall red_server && red_server 

    But that doesn't appear valid in our UTM. I'm root on the console and there is no instance of "red_server" running.

  • What happens if you edit and save the Server definition for the device with ID R20001GHKVGDX50?

    Cheers - Bob

     
    Sophos UTM Community Moderator
    Sophos Certified Architect - UTM
    Sophos Certified Engineer - XG
    Gold Solution Partner since 2005
    MediaSoft, Inc. USA
  • Didn't appear to have any effect.

    Worked with Sophos Support for about 3 hours today. There appears to be a communication issue between our UTM and the cloud interface to save RED configs. Escalation engineered recommended restarting the UTM. Going to try that tonight and see what happens.

    Our case notes:

    Confirmed connection to the registry service over port 3400 but could not find configuration for the RED devices uploaded to the registry service.

    Tried re-starting the red service by enabling debug and disabling debug mode for the RED devices, did not make any difference.

    We also tried manually saving a RED's config to a USB drive and booting from that. Nothing we changed seemed to have any impact.

    I'll update once we have more information and a resolution.

  • 2022:07:29-17:46:02 without any action on our part the REDs suddenly began reconnecting.

     The new lines of the log that triggered this are below. I assume someone at Sophos did something, but there are no notes on our case.

     

    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED10rev1 fw version set to 14
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED10rev2 local fw version set to 5317R2
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED10rev2 fw version set to 2005R2
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED15(w) fw version set to 1-501-bb7bd1013-b1551d2
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED20 fw version set to 1-1176-7ef037314-b1551d2
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED50 fw version set to 1-501-bb7bd1013-0000000
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: RED60 fw version set to 1-1176-7ef037314-b1551d2
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: IO::Socket::SSL Version: 1.953
    
    2022:07:29-17:46:02 gateway-2 red_server[53735]: SELF: Startup - waiting 15 seconds ...
    
    2022:07:29-17:46:17 gateway-2 red_server[53764]: UPLOAD: Uploader process starting
    
    2022:07:29-17:46:19 gateway-2 red_server[53735]: SELF: (Re-)loading device configurations

    It was a ~19 hour outage with no clear cause or resolution.

    If Sophos Support updates me with more info I will post it here.