This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Some Update Managers not updating

Hi all,

We have one Sophos Enterprise Console server and one Sophos Message Router server on site. All the clients have a SUM server on site. Some of these SUM's don't appear to be reporting in, although there is no error. Please note that some of them are working fine.

Our setup:

Local: ( SOPHOS > SOPHOSMR ) - INTERNET - ( CLIENT 1 SUM ---- Client 1 PCs )

Ports 8192/8194/80 are open and telnet works.

LogViewer.exe doesn't show any errors on either server.

Windows Event viewer doesn't show any errors either.

This all started on the day we pushed out the Sophos 10 update and upgraded to Enterprise Console 5. (10th Feb, as per pic below)

http://imageshack.us/f/842/updatemanagers.jpg/

Thanks for any help.

:22387


This thread was automatically locked due to age.
  • Hello Sys_Engineer,

    if you look at your SUMs in the Endpoints view, are they communicating or does the Last message time display the 10th? If - as you say - the Logviewer on the SUMs shows that they are up to date you should check the RMS component (router logs and envelopes folder) on the SUMs.

    Christian

    :22393
  • Hi Christian,

    Thanks for your reply. In 'Endpoint' view the servers show as online but the 'Last Message Time' column shows the 10th Feb, yes. I've looked at the RSM logs on one of the servers but not really sure what a few of the lines mean. I will highlight them below, this is a log from today. I can't paste the whole thing but hopefully this is enough. Thanks again.

    28.02.2012 21:39:12 16A8 I SOF: C:\ProgramData/Sophos/Remote Management System/3/Router/Logs/Router-20120228-083912.log
    28.02.2012 21:39:12 16A8 I Sophos Messaging Router 3.4.0.2209 starting...
    28.02.2012 21:39:12 16A8 I Setting ACE_FD_SETSIZE to 138
    28.02.2012 21:39:12 16A8 I Initializing CORBA...
    28.02.2012 21:39:12 16A8 I Setting connection cache limit to 10
    28.02.2012 21:39:12 16A8 I Creating ORB runner with 4 threads
    28.02.2012 21:39:12 16A8 I This computer is part of the domain REMAT
    28.02.2012 21:39:12 16A8 E ACE_DLL::open failed for TAO_ImR_Client: Error: check log for details.
    28.02.2012 21:39:12 16A8 E Unable to find service: ImR_Client_Adapter
    28.02.2012 21:39:12 16A8 I This router's IOR:
    IOR:010000002600000049444c3a536f70686f734d6573736167696e672f4d657373616765526f757465723a312e300000000100000000000000a0000000010102000e0000003139322e3136382e322e3235300001204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f7574657200000003000000000000000800000001002200004f41540100000014000000010022000100010000000000090101000000000014000000080000000100a60086000220
    28.02.2012 21:39:12 16A8 I Successfully validated this router's IOR
    28.02.2012 21:39:12 16A8 I Reading router table file
    28.02.2012 21:39:12 16A8 I Host name: REMAT2K3
    28.02.2012 21:39:12 16A8 I Local IP addresses: 192.168.2.250
    28.02.2012 21:39:12 16A8 I Resolved name: REMAT2K3.Remat.local
    28.02.2012 21:39:12 16A8 I Resolved alias/es:
    28.02.2012 21:39:12 16A8 I Resolved IP addresses: 192.168.2.250
    28.02.2012 21:39:12 16A8 I Resolved reverse names/aliases: REMAT2K3.Remat.local
    28.02.2012 21:39:12 16A8 I Waiting for messages...
    28.02.2012 21:39:12 1700 I Getting parent router IOR from SOPHOSMR.LANCOM.CO.NZ:8192
    28.02.2012 21:39:12 16A8 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 28, max number of user ports 59530
    28.02.2012 21:39:13 1700 I Received parent router's IOR:
    IOR:010000002600000049444c3a536f70686f734d6573736167696e672f4d657373616765526f757465723a312e300000000100000000000000a80000000101020016000000736f70686f736d722e6c616e636f6d2e636f2e6e7a0001204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f757465720000000300000000000000080000000100af00004f415401000000140000000100af000100010000000000090101000000000014000000080000000100a60086000220
    28.02.2012 21:39:13 1700 I Successfully validated parent router's IOR
    28.02.2012 21:39:13 1700 I Accessing parent
    28.02.2012 21:39:13 1700 I Parent is Router$SophosMR:72015
    28.02.2012 21:39:13 1700 I RouterTableEntry::LogonToParentRouter() - logging on as active consumer
    28.02.2012 21:39:13 1700 I RouterTableEntry state (router, logging on): Router$SophosMR:72015 is passive consumer, passive supplier
    28.02.2012 21:39:13 1700 I Logged on to parent router as Router$REMAT2K3:9004
    28.02.2012 21:39:13 1700 I This computer is part of the domain REMAT
    28.02.2012 21:39:23 16C0 I Client::LogonPushPush() successfully called back to client
    28.02.2012 21:39:23 16C0 I Logged on Agent as a client
    28.02.2012 21:39:23 16F0 I Routing to Agent: id=034C92BB, origin=Router$REMAT2K3:9004, dest=Router$REMAT2K3:9004.Agent, type=EM-ClientLogon
    28.02.2012 21:39:23 16EC I Sent message (id=034C92BB) to Agent
    28.02.2012 21:39:25 16F0 I Received message for this router
    28.02.2012 21:39:25 16F0 I EM-NotifyClientUpdates originator Router$REMAT2K3:9004.Agent
    28.02.2012 21:39:25 16F0 I Received message for this router
    28.02.2012 21:39:25 16F0 I EM-GetClientStatus EMLib originator Router$REMAT2K3:9004.Agent
    28.02.2012 21:39:25 16F0 I Routing to Agent: id=054C92BD, origin=Router$REMAT2K3:9004, dest=Router$REMAT2K3:9004.Agent, type=EM-NotifyClientUpdates-Reply
    28.02.2012 21:39:25 16F0 I Routing to Agent: id=074C92BD, origin=Router$REMAT2K3:9004, dest=Router$REMAT2K3:9004.Agent, type=EM-GetClientStatus-Reply
    28.02.2012 21:39:25 16E4 I Sent message (id=054C92BD) to Agent
    28.02.2012 21:39:25 16E4 I Sent message (id=074C92BD) to Agent

    < text omitted >
    29.02.2012 08:30:20 16EC I Sent message (id=014D2B4C) to Router$SophosMR:72015
    29.02.2012 08:39:14 16A8 I RouterSystemCheck::onInfoPortsUsed() - number of user ports 52, max number of user ports 59530
    29.02.2012 08:58:14 170C I Host IP Addresses have changed
    29.02.2012 08:58:14 16A8 I Shutting down...
    29.02.2012 08:58:15 16A8 I Writing router table file
    29.02.2012 08:58:15 16A8 I Setting connection cache limit to 10
    29.02.2012 08:58:15 16A8 I Creating ORB runner with 4 threads
    29.02.2012 08:58:15 16A8 I This computer is part of the domain REMAT
    29.02.2012 08:58:15 16A8 E ACE_DLL::open failed for TAO_ImR_Client: Error: check log for details.
    29.02.2012 08:58:15 16A8 E Unable to find service: ImR_Client_Adapter
    29.02.2012 08:58:15 16A8 I This router's IOR:
    IOR:010000002600000049444c3a536f70686f734d6573736167696e672f4d657373616765526f757465723a312e300000000200000000000000a00000000101023c0d0000003139322e3136382e322e3239006101204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f7574657264653c0300000000000000080000000101a600004f415401000000140000000101a6000100010000000000090101000000000014000000080000000101a6008600022000000000a00000000101023c0e0000003139322e3136382e322e3235300001204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f7574657264653c0300000000000000080000000101a600004f415401000000140000000101a6000100010000000000090101000000000014000000080000000101a60086000220
    29.02.2012 08:58:15 16A8 I Successfully validated this router's IOR
    29.02.2012 08:58:15 16A8 I Reading router table file
    29.02.2012 08:58:15 16A8 I Host name: REMAT2K3
    29.02.2012 08:58:15 16A8 I Local IP addresses: 192.168.2.29 192.168.2.250
    29.02.2012 08:58:15 16A8 I Resolved name: REMAT2K3.Remat.local
    29.02.2012 08:58:15 16A8 I Resolved alias/es:
    29.02.2012 08:58:15 16A8 I Resolved IP addresses: 192.168.2.29 192.168.2.250
    29.02.2012 08:58:15 16A8 I Resolved reverse names/aliases: REMAT2K3.Remat.local
    29.02.2012 08:58:15 16A8 I Waiting for messages...
    29.02.2012 08:58:15 3BEC I Getting parent router IOR from SOPHOSMR.LANCOM.CO.NZ:8192
    29.02.2012 08:58:15 3BEC I Received parent router's IOR:
    IOR:010000002600000049444c3a536f70686f734d6573736167696e672f4d657373616765526f757465723a312e300000000100000000000000a80000000101020016000000736f70686f736d722e6c616e636f6d2e636f2e6e7a0001204100000014010f004e5550000000210000000001000000526f6f74504f4100526f7574657250657273697374656e740003000000010000004d657373616765526f757465720000000300000000000000080000000100af00004f415401000000140000000100af000100010000000000090101000000000014000000080000000100a60086000220
    29.02.2012 08:58:15 3BEC I Successfully validated parent router's IOR
    29.02.2012 08:58:15 3BEC I Accessing parent
    29.02.2012 08:58:20 3BEC I Parent is Router$SophosMR:72015
    29.02.2012 08:58:20 3BEC I RouterTableEntry::LogonToParentRouter() - logging on as active consumer
    29.02.2012 08:58:20 3BEC I RouterTableEntry state (router, logging on): Router$SophosMR:72015 is passive consumer, passive supplier
    29.02.2012 08:58:20 3BEC I Logged on to parent router as Router$REMAT2K3:9004
    29.02.2012 08:58:20 3BEC I This computer is part of the domain REMAT
    29.02.2012 09:00:42 3BFC I Client::LogonPushPush() successfully called back to client
    29.02.2012 09:00:42 3BFC I Writing router table file
    29.02.2012 09:00:42 3BFC I Logged on Agent as a client
    29.02.2012 09:00:42 2F90 I Routing to Agent: id=014D326A, origin=Router$REMAT2K3:9004, dest=Router$REMAT2K3:9004.Agent, type=EM-ClientLogon
    29.02.2012 09:00:42 35A0 I Sent message (id=014D326A) to Agent
    29.02.2012 09:01:02 2F90 I Routing to parent: id=014D327E, origin=Router$REMAT2K3:9004.Agent, dest=EM, type=EM-GetStatus-Reply
    29.02.2012 09:01:03 1F88 I Sent message (id=014D327E) to Router$SophosMR:72015
    29.02.2012 09:04:06 2F90 I Routing to parent: id=014D3335, origin=Router$REMAT2K3:9004.Agent, dest=EM, type=EM-GetStatus-Reply
    29.02.2012 09:04:06 28BC I Sent message (id=014D3335) to Router$SophosMR:72015
    29.02.2012 09:05:17 2F90 I Routing to parent: id=014D337D, origin=Router$REMAT2K3:9004.Agent, dest=EM, type=EM-GetStatus-Reply
    29.02.2012 09:05:17 35A0 I Sent message (id=014D337D) to Router$SophosMR:72015

    :22423
  • Hi,


    The errors you highlight aren't anything to worry about.  The first thing that strikes me are the lines:


    Setting ACE_FD_SETSIZE to 138

    Setting connection cache limit to 10

    Creating ORB runner with 4 threads

    These values are those you would find on a client configuration of RMS, not a SEC server or message relay.  The values for a relay should be: 20640, 20512 abd 16 respectively.


    The configuration interms of routing looks OK, i.e. The IOR has been overridden with an externaly routable address of "sophosmr.lancom.co.nz"

    If you ping SOPHOSMR.LANCOM.CO.NZ  you get 210.54.149.17 and you can telnet the ports.

    The MR is called: REMAT2K3.Remat.local and has the IPs: 192.168.2.29 and 192.168.2.250

    sophosmr.lancom.co.nz has a DNS record to point it to: 210.54.149.17I suspect that this forwards traffic, 8192, 8194 to 192.168.2.29 or 192.168.2.250


    So I can only think that the configuration options mentioned above are limiting the system in terms of the number of requests the router can service.  I would therefore suggest:
     

    1. Stop the Router service on the relay.

    2. Edit the following registry key values under:

    HKEY_LOCAL_MACHINE\SOFTWARE\[Wow6432Node]\Sophos\Messaging System\Router


    ConnectionCache (20512 dec)

    NumORBThreads (16 dec)

    I think ACE_FD_SETSIZE might be calculated.


    3. Start the router.

    These should enable the relay to handle more concurrent connections.

    That being said, you shouldn't have to set this registry keys manualy though as they should be set during the process of setting up a relay.  The relay is typically setup by creating a CID for it; you copy a custom mrinit.conf into the RMS sub-directory (specifying the relay details as the parentaddress) and run ConfigCID.exe to add the mrinit.conf to the catalog file cidsync.upd.  Then when you install from this CID, the relay machine is setup as a relay, which involves ClientMRInit.exe setting the appropriate "Server" class values as per:

    http://www.sophos.com/support/knowledgebase/article/14635.html .  You may want to check that is setup, otherwise, the relay may revert to a client on update.

    Regards,

    Jak

    :22427
  • Hi Jak,

    Appreciate the log explanation.

    sophosmr.lancom.co.nz > is the message relay server

    The server local to the client, remat2k3 in this case, is just configured as a SUM.

    The mrinit.conf file, on all our SUMs, has the following value:

    "ParentRouterAddress"="SOPHOSMR.LANCOM.CO.NZ"

    This is on ones that are working and some that aren't.

    Turns out the Update Managers (SUMs) which are showing the Last Updated date of 10th Feb are also showing as disconnected in the Endpoints view. This is temporarily fixed when the server is restarted but soon after it shows disconnected again.

    Thanks again,

    Gregor

    :22431
  • Ahh, those are the logs of the SUM not the relay, I should have realized by parsing the IOR that it contained locally IPs, in that case the values are correct.

    The router logs show nothing odd, the machine received a new IP at 29.02.2012 08:58:14 , this caused the router to reinit, forcing it to update it's IOR but the local agent service then logs on and it ends up sending in EM-GetStatus-Reply messages, so from those logs it all looks like it's all working at least on the "client" end.

    AS some information: "Clients" will show with a red-cross in SEC if:
     

    1. The client router logs off the server router.  You should see the router logoff message in the server router log for the client.

    2. The Client misses 2 polls to the server router, which is 2X15 mins (twice the getter interval).  The router log will indicate this has happened.

    3. With SEC 4.7 onwards, every 24 hours the Sophos Management Service runs a maintenance job, one of the operations is to mark as disconnected, all machines which have a lastmessagetime older than 24 hours.  The last message time for a client is updated, when the server receives either a EM-GetStatus-Reply or a Entity-Event.  So essentially a status message or an alert.

    I would track , that the EM-GetStatus-Reply message that is generated by the SUM, is received on the relay router and then found in the router logs of the router on the SEC server.  Maybe then check that you can find the corresponding entry in the sophos-management-services.log and  Msgn-[timestamp].log.

    So:

    SUM Agent -> SUM Router -> Relay Router -> SEC Router -> Sophos Management Service -> Database -> SEC GU.

    Is the path for the message. 

    Regards,

    Jak

    :22441
  • Hello Gregor,

    Sent message (id=014D337D) to Router$SophosMR:72015

    this looks like the client (SUM) is sending the message upstream to the relay (Jak, what do you think?). If I understand you correctly (SUMs... are also showing as disconnected .... This is temporarily fixed when the server is restarted but soon after it shows disconnected again) the clients (=SUMs) seem to log on to the management server but the last message time is not updated and they disconnect?

    I'd do the following:

    Restart the Message Router on one of the failing SUMs. If it has got as far as sending a message (like the line quoted on top) check the management server's router log. I expect that the logon from this SUM (in your example REMAT2K3) is logged but shortly after that an error is recorded (it should be around the time the message is sent, which might be delayed as it is passed through the message relay).

    Christian

    :22449
  • Hi Guys,

    Thanks for the replies. I came in this morning and the server (remat2k3) is showing as 'online' on the console, not sure how or why because I didn't change anything. The same thing is true for another SUM. Although the 'last message time' and 'up to date' status still shows 10th Feb.

    These are the router logs from the Enterprise Console server. So it looks like it is reporting in.

    01.03.2012 05:21:18 1610 I Routing to EM: id=014E507E, origin=Router$SophosMR:72015.Router$REMAT2K3:9004.Agent, dest=EM, type=EM-EntityEvent
    01.03.2012 05:21:18 1680 I Sent message (id=014E507E) to EM
    01.03.2012 05:21:38 1610 I Routing to EM: id=014E5092, origin=Router$SophosMR:72015.Router$REMAT2K3:9004.Agent, dest=EM, type=EM-GetStatus-Reply
    01.03.2012 05:21:38 0DA0 I Sent message (id=014E5092) to EM

    01.03.2012 05:22:02 1610 I Routing to EM: id=014E50AA, origin=Router$SophosMR:72015.Router$REMAT2K3:9004.Agent, dest=EM, type=EM-EntityEvent
    01.03.2012 05:22:02 0458 I Sent message (id=014E50AA) to EM
    01.03.2012 05:22:22 1610 I Routing to EM: id=014E50BE, origin=Router$SophosMR:72015.Router$REMAT2K3:9004.Agent, dest=EM, type=EM-GetStatus-Reply
    01.03.2012 05:22:22 08A8 I Sent message (id=014E50BE) to EM

    01.03.2012 05:31:06 1610 I Routing to EM: id=014E52C9, origin=Router$SophosMR:72015.Router$REMAT2K3:9004.Agent, dest=EM, type=EM-GetStatus-Reply
    01.03.2012 05:31:06 08A8 I Sent message (id=014E52C9) to EM

    I'm stumped by this whole thing :S

    :22453
  • Hi guys, any idea on the above?

    :22537
  • Sorry, didn't read your post carefully.

    So the SUMs are now connected but still don't show the correct status (and a current Last Message time)? Apart from trying to track the messages sent by the SUM(s) going to through the relay and their arrival on the server (as already mentioned) - are the failing SUMs using the same relay (and do others use this too)? Are other clients using this relay and report correctly? As the router logon seems to get through timely I don't think it's a backlog. I still suspect that the messages are not accepted by the management server for some reason.

    Christian

    :22549
  • Hi Christian,

    Yes, all the SUMs/PCs use the same relay. Some work and some don't.

    Sometimes they show disconnected and at other times connected but the Last Message time is always stuck on the 10th of Feb. I really cannot see any difference between the ones connecting and the ones that don't.

    Thanks again.

    :22569