This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Clients appearing offline in SEC 5.1

Hi All,

I have recently installed SEC 5.1 with 10 clients on in preperation for an upgrade from 4.7 later this year. 

For some reason some clients after a period of time get a little red cross on them, these can be virtual and physical machines. The machines are not turned off and are still getting updates from the SEC.

To get them back green I can either reboot them, restart the client message router or re protect them.

Anyone else had this issue?

Thanks

Tom

:39143


This thread was automatically locked due to age.
  • Hello Tom,

    the disconnecting clients are from the 10 "preparation" clients? Have you ever waited for them to reconnect "by themselves" (i.e. without a reboot)? The message router service is still running when they appear disconnected?

    I'd check the Sophos Network Communications Report and the Router logs on the clients first.

    Christian

    :39151
  • Hi Tom

    you can believe me - I have had hours (if not days) with Sophos and such discussions. We have got the same situation. In our environment with approx 2000 machines, they will loose the connection to the management. I have also noticed from many other customers that they experience the same issue. The problem is that no one really finds a solution except the following:

    On a client with the problem, just check out this folder: c:\ProgramData\Sophos\Remote Management System\3Router\Envelopes

    There should be some files. Sophos stores it's messages to this router and they will be sent via Remote Management Service. If the connection breaks up and is not restarted, Sophos will wait until there are several messages in this folder and automatically reastert the service. However, in a low-configured environment with Anti Virus only, this might be a bit problematic, as only 5-10 messages are created in a day.

    On your clients you can configure via a regkey how long the service will wait until it restarts. Use this batch codes (to execute in Windows CMD) in order to wait for 30 messages in the folder until a restart is done:

    reg add "HKLM\Software\WOW6432Node\Sophos\Messaging System\Router" /v MonitorEmSender /t REG_DWORD /d 1 /f
    reg add "HKLM\Software\WOW6432Node\Sophos\Messaging System\Router" /v MonitorEmSenderMaxFailures /t REG_DWORD /d 30 /f

    Otherwise, to keep it more simple you can also restart the "Sophos Message Router" windows service on the affected machiens after they're offline

    Regards

    :39153
  • Thanks for your replies.

    Christian - the disconnecting clients are from the 10 test clients (none in particular though), the current SEC 4.7 is functioning correctly and does not suffer this problem. I have waited a few days for them to re appear but they never do....... maybe if I left it a week or so they would? I'll take a look at those logs and let you know.

    Blocker - thanks, some good info there, I'll take a look now...

    Thanks

    Tom

    :39155
  • Hi,

    Worth noting that SEC 5.2 is out so it might be worth testing that.

    As for the disconnected state: the management servcice will disconnect clients if the last message time is older than 24 hours.  It runs this job every 24 hours.  So you may want to check in SEC the last message time of the clients in this disconnected state.  If it is older than 24 hours then this could be the reason.

    The last message time is updates whenever the client sends in an event message or status message.  So you'd expect if the client updates every 4 hours with a new IDE that this should be OK.

    The other method for a client to become disconnected is if the parent message router doesn't hear from the client, within around 30 minutes.  The client polls the server for messages roughly every 15 minutes.  The server expects to hear from the client within 2 polls, so around 30 mins.  If it doesn't then it logs the client off, this action is recorded in the server router log.  As a communication timout.


    Hope it helps.

    Regards,

    Jak

    :39161
  • jak,

    of course we are using SEC 5.2 in our environment. I have had many support guys for a visit and no one really found a solution.

    Only the presales guys told us, that this is a real problem but htere is currently no other way around it.

    For example today, approx 400 servers (which are online 24/7) went offline and they will not become online until they're rebooted.

    We have this situation since we use Sophos, and I know that this clients will not be online again at least for a month...

    So at least for us my posted regkey is a solution.

    /b

    :39163
  • After they all go offline and the parent router is in this broken state, what do you get if you run (from any client with access to the router the disconnected clients are talking to):

    openssl s_client -connect [computername] 8194

    Where [computername] is the name of the computer the parent router is running on.  I would suggest running the test remote of the parent router to be a true test. Openssl.exe (if you want to run it on Windows) can be found here: http://slproweb.com/products/Win32OpenSSL.html

    Does it return the "Server certificate"?

    Does it come back promptly or take a while?

    Does it return an error?

    Out of interest, what OS and SP is the server running (parent message router)?

    Are you using message relays or do the clients communicate directly with the SEC server?

    Regards,

    Jak

    :39167
  • One of the clients went offline yesterday evening, its a vm and below is a quick summary about the machine:

    Appears wih a red x in SEC

    Still recieving updates

    All Sophos services running

    No issues in 'Sophos network communications report'

    No items in 'C:\ProgramData\Sophos\Remote Management System\3\Router\Envelopes'

    Logs in 'C:\ProgramData\Sophos\Remote Management System\3\Router\Logs' show: (This is just a portion from roughly when it went offline to now)

    
    

    07.04.2013 14:19:04 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 15:19:04 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 16:19:04 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 17:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 18:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 19:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 20:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 21:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 22:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    07.04.2013 23:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 00:19:05 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 01:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 02:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 03:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 04:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 05:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 06:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 07:19:06 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 08:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 09:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 10:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 11:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 12:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 13:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 14:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 15:19:07 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 16:19:08 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 17:19:08 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 18:19:08 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 19:19:08 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 20:19:08 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 21:19:08 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 21:36:13 07C4 I Routing to parent: id=01632A3D, origin=Router$NAMEOFSERVER:27009.Agent, dest=EM, type=EM-EntityEvent
    08.04.2013 21:36:13 07B8 I Sent message (id=01632A3D) to Router$NAMEOFSERVER
    08.04.2013 22:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    08.04.2013 23:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 00:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 01:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 02:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 03:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 04:19:09 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 05:19:10 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 06:19:10 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 07:19:10 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360
    09.04.2013 08:19:10 0728 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 4, max number of user ports 15360

    :39193
  • Hello Tom,

    thanks. Still receiving updates - this is independent of RMS.

    Looks like the client is able to and does indeed send messages upstream. So the Last message time for this client is not 21:36:13 (or near it)? The Router log on the server (if still available) normally doesn't log the messages received but it should have a corresponding entry when the message is passed to EM. If it doesn't pass on the message (can't say what could be the reason) I think it should stay in the envelopes folder.

    Nevertheless it is not clear where the "black hole" could be. Hm, what happens if you restart the Message Router on the server?

    Christian

    :39199
  • Hi Christian,

    Restarting the Message Router on the server brings the offline client back online.

    Tom 

    :39201
  • Hello Tom,

    [restarting MR on the server} brings the offline client back online
    great :smileyfrustrated: - I suspected it, but what next? It suggests that it's not the clients ... looks as if the receivers on the server stop passing on the messages while still accepting them on the client connection. Well, this is beyond my meager knowledge. I'm used to seeing more than one upstream (i.e. to the server's port 8194) connection for more than a few of the clients, for some clients also one or two additional downstream connections (on rare occasions even many) - suggesting that connections sometimes "die" without being taken down but the client notices and makes a new connection. A few checks indicate that there is always a working connection (pair), the clients appear connected in SEC. Not clean but I've never experienced a problem - and definitely not the one you describe.

    Checking one of my servers with about 2000 endpoints I found about half of them disconnected - this could or could not be correct as many of them are in semi-public computer rooms which might be closed. Now if I had the same issue the numbers should change significantly (in the direction of more connected) when I restart the server's MR. They did not - I dare to say they did not at all, the difference was just a few and in the opposite direction.  

    Why it apparently evolves into a problem at some sites I can't say. Most of our clients (several 1000) are not rebooted regularly, server and services run for months without being restarted - so "MR deterioration" doesn't seem to be the cause. 


    Christian

    :39205