This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SEC - "Awaiting Policy Transfer" randomly appearing

SEC Version: 4.7.0.13

Client Version: 9.7

----------------------------------------

Hey,

this has been something on going for sometime now and I was hoping I might be able to find a permanent solution.

Basically what happens is I get random computers that one day decide to report the message "Awaiting Policy Transfer". Rebooting the machines does not correct the message, forcing the machine to do an update does not correct the message either. So far the only thing that seems to work is to push the client back down to the computer. The message is cleared and then sometimes with the same machine it reappears, other times it does not.

It is not specific to any one computer, it just seems to be completely random with no rhyme nor reason to why it appears.

I have done the stopping and starting of the "Sophos Message Router" service test to see if the clients are communicating back to the SEC. When I stop the service, I do receive a "red x" over the client in question and the "x" goes away when I start the service back up.

The message doesn't seem to ever clear itself once it appears, the number of clients that report this message seems to climb unless I push the client back down to the computer.

I am going to continue to monitor to see if the clients eventually do correct themselves, but I suspect they do not.

Does anyone else have this issue or have you run into this issue? If so, how did you correct it?

Any advice would be great!

Thank you

:16273


This thread was automatically locked due to age.
  • Hi,

    You don't mention if performing a comply with policy for the policy in this state helps?  Does it?

    Is it always the same policy that is in this state?  I.e. Is is SAV, AutoUpdate, SCF?

    As far as I can tell this is how it should work:  Looking in the Agent log on the client....If you restart the Sophos Agent service on the client, within about 20 seconds, it will send back a status message which will detail for each of the managed components if they are compliant with policy or not.  If we take the SAV policy (PolicyType="2") as an example we see in the Agent log:


    I SAV state observer received a status: <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <status xmlns="http://www.sophos.com/EE/EESavStatus"><csc:CompRes xmlns:csc="com.sophos\msys\csc" Res="Same" RevID="{94E7A42F-30FD-41FD-8345-23DE0C0C9CA3}" policyType="2"/>....


    So we can see that the agent is reporting "Same" (Res="Same") for policytype 2 (SAV) for the policy with ID {94E7A42F-30FD-41FD-8345-23DE0C0C9CA3}

    So if my machine is in a group with a SAV policy called "Clients" I could run the following SQL:
    select name, correlationid from Policies where type= 2 and name ='Clients'

    and get back something like:


    name                                                         correlationid                                              

    Server                                                       {94E7A42F-30FD-41FD-8345-23DE0C0C9CA3}

    So we can see that the "version" of the policy that the client should have is:{94E7A42F-30FD-41FD-8345-23DE0C0C9CA3}.  So this machine in this state would show as "Same as policy" for the component SAV.

    If however I update the SAV policy, this will change the ID of the policy in the policies table.  Essentially it appears a new GUID is generated, so it has a new revid.

    The machine then goes into a state of "Awaiting policy transfer". as the value the client has reported is for  {94E7A42F-30FD-41FD-8345-23DE0C0C9CA3} and now the policy revid is another GUID.

    It will not show as Differs from policy until the client actually sends back a Res="Diff" for a given policy ID.

    Could it be that the following is happening for these machines:

    1. Machine complies with SAV policy (using SAV policy as the example policy that is in this state).

    2. Machine goes offline - red cross.

    3. Policy change is made for the policy of SAV linked to the group the machine is in.

    4. Set-Config message is created for the machine but as it's not connected the message is queued on the server with a TTL of 4 days.

    5. The client logs on on the 5th day, the set config message has been deleted, so it doesn't get the message, the client hasn't reported Differs yet (Res="Diff"), but it still reports the old policy id.  Therefore the system thinks it's still in the awaiting policy transfer state.

    In this case a comply with the policy that is in this state to the machine, if, the machine is on, should fix it?  If the machine is not "connected" then the create message, timeout message scenario could repeat, leaving it in this state.  Does this sound plausible as a theory?

    Regards,

    Jak 

    EDIT, looking into this a bit further I've found the following 2 registry keys that can be enabled on the management server machine:
    HKLM\SOFTWARE\[Wow6432Node]\Sophos\EE\Management Tools\MessagingDoActionTimeout

    HKLM\SOFTWARE\[Wow6432Node]\Sophos\EE\Management Tools\MessagingSetConfigurationTimeout

    These two DWORD values are read by the management service when it attempts to create the messages but don't exist by default it seems.

    They appear to override the default TTL for a message. So to change the default TTL on a set-config message you could create the above MessagingSetConfigurationTimeout  DWORD key, The value is in seconds.  The downside is that the envelopes directory is more likely to fill if you perform set configurations to machines that are unable to recieve the message as the TTL value is higher.  So this might be something to consider implementing if the above theory is correct and that clients aren't getting set config messages as they are timing out before they client can get them.

    :16283
  • @jak

    Thank you for responding! and thank you for explaining in such detail.

    It never dawned on me (I apologise for not thinking of this before posting here) to check to see what policies weren't in compliance. For some reason I just assumed that it was all policies that were not tranfered correctly..

    After reading your post I checked some of the clients and sure enough there were just two policies that appeared consistently: "Device control policy compliance" and "Anti-vius and HIPS policy". Sometimes it was just one of the policies reporting a problem, then on other clients it was both policies.

    I did check my local log for the example you showed and I can see these entries being logged.

    I tried to force a policy compliance on the machines reporting this message, and sure enough doing this corrected the issue on all of them!

    Thinking about the scenario you talked about, the TTL of 3 days definitely sounds like something that could be causing the problem. I did some checking on the machines that were reporting these messages and in a lot of the cases these machines belong to people who were recently on vacation. This would mean that their machines would have been offline for an extended period of time. So if a policies change had been made, that 3rd day may have come and gone and the machines may have missed the set-config message.

    ------------------------------------------

    I do have one or two machines that do not fit this criteria though. They are in a child group that has a different "Device Control Policy" from it's parent. I suspect that this could be causing an issue with the machine receiving the set-config message. The two machines in question appear consistently, and these are not machines that are turned off on a regular basis. These machines might require additional investigation to find out why the set-config message is not resetting the GUID's (hopefully my terminology here is correct).

    ------------------------------------------

    The registry entries you suggest look interesting!

    Sorry but I didn't quite follow the downside you mentioned. Why would the "envelopes" directory be more likely to fill? And what is the envelopes directory? - Sorry for the additional questions!

    If the TTL value is minutes.. I am guessing that the default value would show 3 days represented in minutes right?

    Would I find the TTL value in the same place where I found your example log? Or is the ".msg" file contained some place else? I wouldn't mind seeing this value and maybe we can confirm!

    Sorry for the long winded response, I wanted to make sure I covered as much of the information you gave me as I could!

    thank you again,

    Cheers

    :16287
  • Hi,

    Well the theory sounds promising based on the users of machines reporting this state being away recently.

    The "Envelopes" directory is used by the Sophos Message Router in order to persist messages until they are delivered.  It exists wherever there is a message router.  It can be found in either:

    "\Programdata\Sophos\Remote Management System\3\Router\Envelopes \"

    or

    "\Documents and settings\all users\application data\sophos\Remote Management System\3\Router\Envelopes \"

    depending on the OS.

    So for example, if you're in SEC and you change a SAV policy, the management service will create a message for each client contained in the groups affected by the policy in question.  These messages will then be given to the server message router to send to the clients. For those clients that are logged on to the server router (connected) they should be delivered almost immediately.  However for the machines that are disconnected the messages will be queued in the "Envelopes" directory as .msg files.  

    At this point one of two things can happen,  Either the previously "disconnected" client(s) comes online, log on to the server router and pick up the policies or it doesn't before the message times out.  Messages have to timeout as there is no way SEC can distinguish between a machine that will come back at some point or will not.  If there was no persistence, you'd only ever be able to send polices to machines that were online which would be a real administrative burden.  Message persistence also guarantees that the messages will reach their intended destination even under failure but this is more important with upstream alerts.

    So for SEC 4.7 the Time To Live (TTL) values for downstream messages (set-config and do-action) as set by the management service at 4 days (96 hours) at the point the messages are created (I tested this just now by performing a set config to a disconnected machine, I then found the .msg file that represented this message, grabbed the TTL and converted it).  If you take a .msg file and open it in a text editor, you can see a TTL line followed by a number.  This number if a time stamp in Unix Epoch time.  So if you paste it into something like: http://www.epochconverter.com/ you can see when the message will expire.

    The reason that the number of. msg files will increase in the envelopes directory if you increase the TTL is therefore due to messages intended for disconnected endpoints will remain in the system longer should those machines not check back in in a timely manor

    So with it being 4 days or 96 hours or 5760 minutes, I tested adding the key: MessagingSetConfigurationTimeout and setting the value to 24 * 60 * 60 =  86400 decimal (1 day in seconds).  Sure enough the TTL set the expiry time to 1 day in the future.

    So 604800 would be 7 days in seconds (24*60*60*7).  This might be something you could try.

    Regards,

    Jak



     

    :16289
  • Hi Jak,

    This has been a very informative topic, and I've learn't a great deal.  I also have been getting these messages, and couldn't figure out why, especially since machines had reported in!  Now it all makes sense.  I wish the Sophos Support Team could have explained that, rather than just telling me to comply with the necessary policy each time!

    Sorry to hijack the topic, and slightly change it's direction (I hope you don't mind), though my next question does have similar reference, and you seem very knowledgeable in this area.

    I have certain "IT Staff" on my network, who are part of our Networks Division, and thus are required to be in the local Sophos Administrators group, in order to temporarily disable the firewall, etc, for testing purposes.  Unfortunately they keep forgeting to turn it back on.  As a result, I do occassional checks within SEC to find that their machines differ from policy, and notice the "firewall enable" is set to no.  I naturally tell those machines to comply with policy, and sure enough everything changes as it should.

    I asked Sophos Support, if there was anyway to change the SEC setup, so that if any machine differs from policy, after a set period of time, that machine would recomply.  That way, my Networks Team could make changes for testing purposes, and should they forget to switch back, their machines would do automatically.

    Naturally Sophos Support said there wasn't any feature, however I was wondering if you know of any?

    Kind Regards,

    Jon

    :16313
  • Hi,

     

    Glad you've found the information in this thread insightful.

     

    I can imagine an "Auto-Policy Comply" feature could generate a lot of messages and therefore network traffic if it went wrong or something got into a loop.  I suppose there is also the chance it could also kick it just after the "local admin" changed the setting which could be a bit annoying.  I suppose, maybe it the user had gone through the "Authenticate user" process of tamper protection the auto-comply messages could be dropped/queued until that admin "session" ended?

     

    Anyway, as far as I can tell it's ultimately the management service that needs to be prodded into sending out the set-configuration messages.  It appears that the conditions that would initiate that are:
     

    1. Poke it from SEC with a comply with policy, a modify of a linked policy, or moving the machine to a new group with a different policy.  This however all requires manual actions in SEC.

     

    2. A client requests a policy. This will take place if the client sends back a "no-ref" for the policy in question.

    The no-ref state is essentially a method the client can use to report that it has no local cached policy to compare it's current configuration against and therefore it can't establish if it's in policy or not.

    The cached policies are stored on the client in:
    "\Programdata\Sophos\Remote Management System\3\Agent\AdapterStorage \"

    or

    "\documents and settings\all users\application data\Sophos\Remote Management System\3\Agent\AdapterStorage \"

    as you can see, there is a sub-directory for each managed component and the policy files or files contained within.

     

    When the management agent on the client, performs a status check for SAV for example, the SAVadapter.dll loaded into the management agent must, I assume have information on how to call into SAV to get the running policy, it can then make the comparison and send back the status as mentioned above in the thread.  I suppose on the initial install of the client, these missing adapter storage files, lead to the no-ref which in turn leads to the set-configuration from the server. 

     

    So, that being said, one method to force a policy would be to delete the adapter storage file for the component required and restart the Sophos Agent service.  Within a few moments I would expect SEC to send the policy down in response to the no-ref status for the policy. I'm not sure if this could be made into a practical workable solution but it seems that this would be the only way short of writing something that can talk to the management service to invoke a policy send.

     

    Hope this helps.

     

    Regards,

    :16325
  • Hi,

    Thank you for explaining the "Envelopes" directory.

    I went in and checked the directory and sure enough it was full of ".msg" files waiting to be sent out.

    I checked inside one and did find the TTL value ( "1315687075 " as an example)

    Now I understand though how the system is working and I understand why "Message Persistence" is in play here.

    I now to understand why that directory could potentially fill up! Thank you for explaining all of that.

    I am going to keep note of those registry keys. Having a better understanding of what is going on though has make's me feel more confortable about the message itself. Now that I know how to properly manage the message I am not sure if changing the key's is necessarily needed. Although, changing it to a week might help me better manage when people are away on vacation as most take a week at a time.

    There are still a few machines that are a bit of a mystery, but having the knowledge to management them correctly makes me feel better about seeing the message!

    Thanks again for all your help! I wish your phone support was as helpful as your email and forum support!

    Cheers

    :16357
  • Thanks for the feedback.  I think this is one for the Sophos Development Team.

    :16465