This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Sophos XG all endpoints showing missing heartbeat

Good afternoon,

I have an XG 230 18.0.5 in an HR pair.

Yesterday it decided that all endpoints would show as missing heartbeats. I de-registered and re-registered in Central and now they are all showing connected. Why would that have happened?

Maybe related... yesterday the https and SSH access went offline as well. The only way we were able to regain access was to power cycle both units in the HA pair. It happened again this AM, so I am working on that. No errors in the log, no errors with HA.

Thoughts?

Thanks, Brent

This thread was automatically locked due to age.

Parents

0 BrentMagnant over 4 years ago

Happened again this AM. Firewalls were down. Turned them both off and turned on 1 so no HA at the moment.

ctsyncd.log shows

[Wed Jul 28 07:06:56 2021] (pid=2528) [notice] using user-space event filtering
[Wed Jul 28 07:06:56 2021] (pid=2528) [notice] netlink event socket buffer size has been set to 4194304 bytes
[Wed Jul 28 07:06:56 2021] (pid=2528) [notice] initialization completed
[Wed Jul 28 07:06:56 2021] (pid=2530) [notice] binded on cpu 0
[Wed Jul 28 07:06:56 2021] (pid=2530) [notice] -- starting in daemon mode --
[Wed Jul 28 07:06:56 2021] (pid=2530) [ERROR] no dedicated links available!
[Wed Jul 28 07:06:57 2021] (pid=2530) [ERROR] no dedicated links available!
[Wed Jul 28 07:06:57 2021] (pid=2530) [ERROR] no dedicated links available!
[Wed Jul 28 07:06:59 2021] (pid=2530) [ERROR] no dedicated links available!
[Wed Jul 28 07:06:59 2021] (pid=2530) [ERROR] no dedicated links available!
[Wed Jul 28 07:07:00 2021] (pid=2530) [ERROR] no dedicated links available!

msync.log - it just stopped @ 1829 yesterday until the reboot this AM

Tue Jul 27 18:29:13 2021:222634:1372:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
Tue Jul 27 18:29:13 2021:222661:1372:MAST:MAST:DEBUG:sync_entity.c:951sesid:33486: cmd ipset -D hostset fqdn,580,0,52.22
Tue Jul 27 18:29:13 2021:225053:1372:MAST:MAST:DEBUG:sync.c:921sesid:33486:ipset -D hostset fqdn,580,0,52.22
Tue Jul 27 18:29:13 2021:225079:1372:MAST:MAST:DEBUG:sync.c:903sesid:33486 ipset -D hostset fqdn,580,0,52.22 3
Tue Jul 27 18:29:13 2021:996260:1372:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
Tue Jul 27 18:29:13 2021:996370:1372:MAST:MAST:DEBUG:sync_entity.c:938sesid:33487: opcode HBAddEacEpRel
Tue Jul 27 18:29:14 2021:132740:1372:MAST:MAST:DEBUG:sync.c:921sesid:33487:HBAddEacEpRel
Tue Jul 27 18:29:14 2021:132779:1372:MAST:MAST:DEBUG:sync.c:903sesid:33487 HBAddEacEpRel 3
Tue Jul 27 18:29:15 2021:357960:1372:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
Tue Jul 27 18:29:15 2021:357995:1372:MAST:MAST:DEBUG:sync_entity.c:951sesid:33488: cmd ipset -D hostset fqdn,752,0,142.2
Tue Jul 27 18:29:15 2021:361405:1372:MAST:MAST:DEBUG:sync.c:921sesid:33488:ipset -D hostset fqdn,752,0,142.2
Tue Jul 27 18:29:15 2021:361439:1372:MAST:MAST:DEBUG:sync.c:903sesid:33488 ipset -D hostset fqdn,752,0,142.2 3
Tue Jul 27 18:29:17 2021:251772:1372:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
Tue Jul 27 18:29:17 2021:251838:1372:MAST:MAST:DEBUG:sync_entity.c:938sesid:33489: opcode HBAddEacEpRel
Tue Jul 27 18:29:17 2021:371736:1372:MAST:MAST:DEBUG:sync.c:921sesid:33489:HBAddEacEpRel
Tue Jul 27 18:29:17 2021:371774:1372:MAST:MAST:DEBUG:sync.c:903sesid:33489 HBAddEacEpRel 3
Tue Jul 27 18:29:20 2021:500211:1372:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
Tue Jul 27 18:29:20 2021:500277:1372:MAST:MAST:DEBUG:sync_entity.c:938sesid:33490: opcode HBAddEacEpRel
Tue Jul 27 18:29:20 2021:621209:1372:MAST:MAST:DEBUG:sync.c:921sesid:33490:HBAddEacEpRel
Tue Jul 27 18:29:20 2021:621244:1372:MAST:MAST:DEBUG:sync.c:903sesid:33490 HBAddEacEpRel 3
T:DEBUG:sync.c:903sesid:33497 /scripts/ha/managetimeropcodes.sh 3
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 4 years ago in reply to BrentMagnant

did you or someone change something on a RED device before this began? Check the Admin Log.

and: please open a support case - rebooting your complete HA to get it working again is a no-go for a firewall setup...
Cancel
Vote Up 0 Vote Down

Cancel
0 BrentMagnant over 4 years ago in reply to LHerzog

There were NO changes to the environment. They don't use REDs at all. It just decided to start failing in the middle iof the night... every night! I agree nightly reboots should not be required!

I am leaving HA off for now to see if it stabilizes.

I Have a ticket open with support, but I usually get better answers here! :)
Cancel
Vote Up 0 Vote Down

Cancel
0 LHerzog over 4 years ago in reply to BrentMagnant

If this happens without changes, I think only support can figure it out.

Is this the time when it died? Jul 28 07:06:56

And are they really down so no traffic flowing or just stop responding to https and ssh management? probably they'll ask you to put machines to the console to get outpout in case they crash. You could already prepare some notebooks with serial cable

btw. I fully agree to your last sentence.
Cancel
Vote Up 0 Vote Down

Cancel
0 BrentMagnant over 4 years ago in reply to LHerzog

7:06 is when it was restarted and started logging again.
Cancel
Vote Up 0 Vote Down

Cancel
0 emmosophos over 4 years ago in reply to BrentMagnant

Hello Brent,

I would recommend you to open a case with Support to get this investigated, you can share the Case ID with me.

This morning when the issue happened, were you not able to access the GUI or SSH into the XG either?

Do you see anything under /var/cores?

If this is happening every night, can you try disabling firewall acceleration

console > system firewall-acceleration disable

Additionally to this, as LHerzog suggested, once you get the Serial Connectiong going (Console Logging) you need to do the following:

Using PuTTY, go to 'Session' - 'Logging.'

Here, select "All session output', and set the file name to a folder and name for later retrieval.

Configure the Serial connection to use the proper COM port on your PC and a Speed of 38400.

Start the session, and log in to ensure it is all proper.

Once logged in, you can leave it there or log out and leave the session at the password prompt. Either way, leave the session active and allow it to capture the output from the next reboot.

Once that reboot occurs, you can end the Serial connection and provide the logs to support further investigation.

Either if this "solves" the issue or not open a case with support.

Regards,

Emmanuel (EmmoSophos)

Technical Team Lead, Global Community Support
Sophos Support Videos | Product Documentation | @SophosSupport | Sign up for SMS Alerts
If a post solves your question use the 'Verify Answer' link.
Cancel
Vote Up 0 Vote Down

Cancel
0 BrentMagnant over 4 years ago in reply to emmosophos

I have opened a ticket 04264810.

I got some updated info from the client this AM. They rebooted before I was able to log in and check things out, but this morning's outage was a little different that the last few days. Sounds like VOIP was working, but HTTPS was failing. I have seen something like that in the past when A/V was failing.
Cancel
Vote Up 0 Vote Down

Cancel
0 emmosophos over 4 years ago in reply to BrentMagnant

Hello Brent,

Thank you for the Case ID and the additional notes.

If you haven't I'd put csc in debug mode

# csc custom debug

To disable you can run the same command, or after rebooting the device it’ll disable automatically.

Regards,

Emmanuel (EmmoSophos)

Technical Team Lead, Global Community Support
Sophos Support Videos | Product Documentation | @SophosSupport | Sign up for SMS Alerts
If a post solves your question use the 'Verify Answer' link.
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 emmosophos over 4 years ago in reply to BrentMagnant

Hello Brent,

Thank you for the Case ID and the additional notes.

If you haven't I'd put csc in debug mode

# csc custom debug

To disable you can run the same command, or after rebooting the device it’ll disable automatically.

Regards,

Emmanuel (EmmoSophos)

Technical Team Lead, Global Community Support
Sophos Support Videos | Product Documentation | @SophosSupport | Sign up for SMS Alerts
If a post solves your question use the 'Verify Answer' link.
Cancel
Vote Up 0 Vote Down

Cancel

Children

No Data