Guest User!

You are not Sophos Staff.

This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Antivirus update breaks ssh logins

Over the past month a number of our CentOS 7 machines have stopped accepting logins over SSH. Open sessions freeze but otherwise processes, web servers, etc keep running. It requires a reboot to gain access again. This problem seems to coincide with Sophos updates. Not all updates cause this problem but every time the problem occurs it appears to be immediately after an antivirus update.

Any ideas why Sophos updates would interfere with SSH logins or with SE Linux? Or ways I could further investigate this to demonstrate that there's a real link or that it's just a coincidence?

We recently switched from using Talpa to fanotify. Could this be related?

Here's an example from /opt/sophos-av/log/savd.log. I manually converted the Unix timestamps in order to interpret them.

<log><category>update.updated</category><level>NOTICE</level><domain>savupdate</domain><msg>UPDATING_FROM_VERSION %s %s %s</msg><time>1596516042</time><arg>10.5.2</arg><arg>3.79.0</arg><arg>5.76</arg></log>
<log><category>update.updated</category><level>NOTICE</level><domain>savupdate</domain><msg>%s</msg><time>1596516042</time><arg>Updating Sophos Anti-Virus....
Updating SAVScan on-demand scanner
Updating Virus Engine and Data
Updating Manifest
Update completed.</arg></log>
<log><category>update.updated</category><level>NOTICE</level><domain>savupdate</domain><msg>UPDATED_TO_VERSION %s %s %s</msg><time>1596516042</time><arg>10.5.2</arg><arg>3.79.0</arg><arg>5.76</arg></log>
<log><category>update.updated</category><level>NOTICE</level><domain>savupdate</domain><msg>SUCCESSFULLY_UPDATED_FROM %s</msg><time>1596516042</time><arg>sdds:SOPHOS</arg></log>


The same snippet is repeated several times during the day with the same version numbers which is odd but probably not related to the issue at hand.

/var/log/audit/audit.log (SELinux) shows the antivirus update and then 20 minutes later we start seeing that systemd-logind is unhappy (I don't really understand it all but it's obviously linked to supporting log in).

type=SERVICE_START msg=audit(04/08/20 05:40:42.285:2578) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='unit=sav-update comm=systemd exe=/usr/lib/systemd/systemd hostname=? addr=? terminal=? res=success'
type=SERVICE_STOP msg=audit(04/08/20 05:40:42.285:2579) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='unit=sav-update comm=systemd exe=/usr/lib/systemd/systemd hostname=? addr=? terminal=? res=success'
type=USER_ACCT msg=audit(04/08/20 05:50:01.761:2580) : pid=21416 uid=root auid=unset ses=unset subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct=root exe=/usr/sbin/crond hostname=? addr=? terminal=cron re
s=success'
type=CRED_ACQ msg=audit(04/08/20 05:50:01.763:2581) : pid=21416 uid=root auid=unset ses=unset subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_fprintd acct=root exe=/usr/sbin/crond hostname=? addr=? terminal=cron res=success'
type=LOGIN msg=audit(04/08/20 05:50:01.765:2582) : pid=21416 uid=root subj=system_u:system_r:crond_t:s0-s0:c0.c1023 old-auid=unset auid=root tty=(none) old-ses=4294967295 ses=175 res=yes
type=USER_START msg=audit(04/08/20 05:50:26.796:2583) : pid=21416 uid=root auid=root ses=175 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits acct=root exe=/usr/sbin/crond hostname=? addr=? terminal=cron
res=success'
type=USER_ACCT msg=audit(04/08/20 06:00:01.676:2584) : pid=23854 uid=root auid=unset ses=unset subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:accounting grantors=pam_access,pam_unix,pam_localuser acct=root exe=/usr/sbin/crond hostname=? addr=? terminal=cron re
s=success'
type=CRED_ACQ msg=audit(04/08/20 06:00:01.676:2585) : pid=23854 uid=root auid=unset ses=unset subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:setcred grantors=pam_env,pam_fprintd acct=root exe=/usr/sbin/crond hostname=? addr=? terminal=cron res=success'
type=LOGIN msg=audit(04/08/20 06:00:01.678:2586) : pid=23854 uid=root subj=system_u:system_r:crond_t:s0-s0:c0.c1023 old-auid=unset auid=root tty=(none) old-ses=4294967295 ses=176 res=yes
type=USER_START msg=audit(04/08/20 06:00:26.712:2587) : pid=23854 uid=root auid=root ses=176 subj=system_u:system_r:crond_t:s0-s0:c0.c1023 msg='op=PAM:session_open grantors=pam_loginuid,pam_keyinit,pam_limits acct=root exe=/usr/sbin/crond hostname=? addr=? terminal=cron
res=success'
type=SERVICE_STOP msg=audit(04/08/20 06:00:33.045:2588) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='unit=systemd-logind comm=systemd exe=/usr/lib/systemd/systemd hostname=? addr=? terminal=? res=failed'
type=SERVICE_START msg=audit(04/08/20 06:00:33.293:2589) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='unit=systemd-logind comm=systemd exe=/usr/lib/systemd/systemd hostname=? addr=? terminal=? res=success'
type=SERVICE_STOP msg=audit(04/08/20 06:00:33.293:2590) : pid=1 uid=root auid=unset ses=unset subj=system_u:system_r:init_t:s0 msg='unit=systemd-logind comm=systemd exe=/usr/lib/systemd/systemd hostname=? addr=? terminal=? res=success'


This thread was automatically locked due to age.
Parents
  • Here's an excerpt from /var/log/messages, it's easier to read. I think the dockerd errors are a consequence of whatever's going on rather than a cause but I've left them in just in case they provide a hint. I suspect the import bit is the timeout message.

     

    Aug 4 05:40:32 myhost systemd: Starting "Sophos Anti-Virus update"...
    Aug 4 05:40:42 myhost savd: update.updated: Updating from versions - SAV: 10.5.2, Engine: 3.79.0, Data: 5.76
    Aug 4 05:40:42 myhost savd: update.updated: Updating Sophos Anti-Virus....#012Updating SAVScan on-demand scanner#012Updating Virus Engine and Data#012Updating Manifest#012Update completed.
    Aug 4 05:40:42 myhost savd: update.updated: Updated to versions - SAV: 10.5.2, Engine: 3.79.0, Data: 5.76
    Aug 4 05:40:42 myhost savd: update.updated: Successfully updated Sophos Anti-Virus from sdds:SOPHOS
    Aug 4 05:40:42 myhost systemd: Started "Sophos Anti-Virus update".
    Aug 4 05:41:18 myhost dockerd: time="2020-08-04T05:41:18.255222990+01:00" level=error msg="stream copy error: reading from a closed fifo"
    Aug 4 05:41:18 myhost dockerd: time="2020-08-04T05:41:18.255245866+01:00" level=error msg="stream copy error: reading from a closed fifo"
    Aug 4 05:42:03 myhost dockerd: time="2020-08-04T05:42:03.257145700+01:00" level=warning msg="Health check for container eec7810e9a1c9785dbb9682b371f51443081bbe3adfe4b7562698a9aec5be2a5 error: context deadline exceeded: unknown"
    Aug 4 05:50:01 myhost systemd: Created slice User Slice of root.
    Aug 4 05:53:01 myhost systemd: systemd-logind.service watchdog timeout (limit 3min)!
    Aug 4 05:41:18 myhost dockerd: time="2020-08-04T05:41:18.255222990+01:00" level=error msg="stream copy error: reading from a closed fifo"
    Aug 4 05:41:18 myhost dockerd: time="2020-08-04T05:41:18.255245866+01:00" level=error msg="stream copy error: reading from a closed fifo"
    Aug 4 05:42:03 myhost dockerd: time="2020-08-04T05:42:03.257145700+01:00" level=warning msg="Health check for container eec7810e9a1c9785dbb9682b371f51443081bbe3adfe4b7562698a9aec5be2a5 error: context deadline exceeded: unknown"
    Aug 4 05:50:01 myhost systemd: Created slice User Slice of root.
    Aug 4 05:53:01 myhost systemd: systemd-logind.service watchdog timeout (limit 3min)!
    Aug 4 05:54:32 myhost systemd: systemd-logind.service stop-sigabrt timed out. Terminating.
    Aug 4 05:56:02 myhost systemd: systemd-logind.service stop-sigterm timed out. Killing.
    Aug 4 05:57:32 myhost systemd: systemd-logind.service still around after SIGKILL. Ignoring.
    Aug 4 05:59:02 myhost systemd: systemd-logind.service stop-final-sigterm timed out. Killing.
    Aug 4 06:00:33 myhost systemd: systemd-logind.service still around after final SIGKILL. Entering failed mode.
    Aug 4 06:00:33 myhost systemd: Unit systemd-logind.service entered failed state.
    Aug 4 06:00:33 myhost systemd: systemd-logind.service failed.
    Aug 4 06:00:33 myhost systemd: systemd-logind.service has no holdoff time, scheduling restart.
    Aug 4 06:00:33 myhost systemd: Stopped Login Service.
    Aug 4 06:00:33 myhost systemd: Starting Login Service...
  • Hi,

     

    I'm afraid this looks like a more complicated issue than can be resolved here - I haven't ever seen anything like this before.

     

    I think you'll have to raise a support ticket, so that proper time can be allocated to replicate the scenario. 

     

    I will note that fanotify doesn't provide timeouts, so might be blocking logind for arbitrary periods of time. Maybe check all of your mounts are responding ok?

     

    Thanks,

    Douglas.

  • I'm waiting for our IT team to raise a support ticket but in the meantime I've found that some other antivirus vendors have recorded a similar issue when using fanotify. Looks like something that has been fixed in recent Linux kernels > 3.14 but we're on kernel 3.10 (CentOS 7). I'll see if we can upgrade to a more recent kernel as that seems to be the only solution.

    Here are a few relevant links.

Reply Children
No Data