Hi,
since updating to SFOS 17.1.2 MR-2 a customers XG HA-cluster in A/P mode is ~90% cpu load.
Its a small XG125 setup.
Anybody seen this before?:
This thread was automatically locked due to age.
Nothing, thats what makes me so helpless.
I can also delete HA config, but the processes are still there.
Reboot wont't help. Seems if got to go back to MR1.
EDIT:
After some investigation in seems like the Msyncd service causing this behavior.
I never seen this before.
Log showing:
Thu Sep 6 15:33:49 2018:519367:26908:MAST:MAST:DEBUG:sync.c:829sesid:901:readApiInterface
Thu Sep 6 15:33:49 2018:519446:26908:MAST:MAST:DEBUG:sync.c:811sesid:901 readApiInterface 3
Thu Sep 6 15:33:53 2018:626892:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:33:53 2018:627055:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:902: opcode getappkey
Thu Sep 6 15:33:53 2018:895767:26908:MAST:MAST:DEBUG:sync.c:829sesid:902:getappkey
Thu Sep 6 15:33:53 2018:895852:26908:MAST:MAST:DEBUG:sync.c:811sesid:902 getappkey 3
Thu Sep 6 15:33:55 2018:572904:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:33:55 2018:573073:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:903: opcode lic_substatus
Thu Sep 6 15:33:55 2018:834550:26864:MAST:MAST:DEBUG:worker.c:587 idle workers 10.
Thu Sep 6 15:33:55 2018:834592:26864:MAST:MAST:DEBUG:worker.c:628 worker_num 14.
Thu Sep 6 15:33:55 2018:841181:26908:MAST:MAST:DEBUG:sync.c:829sesid:903:lic_substatus
Thu Sep 6 15:33:55 2018:841236:26908:MAST:MAST:DEBUG:sync.c:811sesid:903 lic_substatus 3
Thu Sep 6 15:33:55 2018:867882:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:33:55 2018:868009:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:904: opcode readApiInterface
Thu Sep 6 15:33:56 2018:478815:26908:MAST:MAST:DEBUG:sync.c:829sesid:904:readApiInterface
Thu Sep 6 15:33:56 2018:478890:26908:MAST:MAST:DEBUG:sync.c:811sesid:904 readApiInterface 3
Thu Sep 6 15:34:01 2018:015950:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:01 2018:194002:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :2
Thu Sep 6 15:34:01 2018:194154:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:906: opcode garnerevent
Thu Sep 6 15:34:01 2018:267592:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
Thu Sep 6 15:34:01 2018:279379:26908:MAST:MAST:DEBUG:sync.c:829sesid:905:
Thu Sep 6 15:34:01 2018:279438:26908:MAST:MAST:DEBUG:sync.c:811sesid:905 3
Thu Sep 6 15:34:01 2018:457749:26908:MAST:MAST:DEBUG:sync.c:829sesid:906:garnerevent
Thu Sep 6 15:34:01 2018:457814:26908:MAST:MAST:DEBUG:sync.c:811sesid:906 garnerevent 3
Thu Sep 6 15:34:01 2018:530231:26908:MAST:MAST:DEBUG:sync.c:829sesid:907:
Thu Sep 6 15:34:01 2018:530296:26908:MAST:MAST:DEBUG:sync.c:811sesid:907 3
Thu Sep 6 15:34:07 2018:713782:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:07 2018:713929:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:908: opcode lic_substatus
Thu Sep 6 15:34:07 2018:981884:26908:MAST:MAST:DEBUG:sync.c:829sesid:908:lic_substatus
Thu Sep 6 15:34:07 2018:981959:26908:MAST:MAST:DEBUG:sync.c:811sesid:908 lic_substatus 3
Thu Sep 6 15:34:08 2018:001467:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:08 2018:001626:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:909: opcode readApiInterface
Thu Sep 6 15:34:08 2018:613194:26908:MAST:MAST:DEBUG:sync.c:829sesid:909:readApiInterface
Thu Sep 6 15:34:08 2018:613264:26908:MAST:MAST:DEBUG:sync.c:811sesid:909 readApiInterface 3
Thu Sep 6 15:34:13 2018:679027:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:13 2018:946962:26908:MAST:MAST:DEBUG:sync.c:829sesid:910:
Thu Sep 6 15:34:13 2018:947032:26908:MAST:MAST:DEBUG:sync.c:811sesid:910 3
Thu Sep 6 15:34:15 2018:874547:26864:MAST:MAST:DEBUG:worker.c:587 idle workers 10.
Thu Sep 6 15:34:15 2018:874601:26864:MAST:MAST:DEBUG:worker.c:628 worker_num 14.
Thu Sep 6 15:34:17 2018:002205:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:19 2018:705657:26908:MAST:MAST:DEBUG:sync.c:829sesid:911:
Thu Sep 6 15:34:19 2018:705723:26908:MAST:MAST:DEBUG:sync.c:811sesid:911 3
Thu Sep 6 15:34:19 2018:985850:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:19 2018:985982:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:912: opcode lic_substatus
Thu Sep 6 15:34:20 2018:253087:26908:MAST:MAST:DEBUG:sync.c:829sesid:912:lic_substatus
Thu Sep 6 15:34:20 2018:253165:26908:MAST:MAST:DEBUG:sync.c:811sesid:912 lic_substatus 3
Thu Sep 6 15:34:20 2018:284612:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:20 2018:284731:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:913: opcode readApiInterface
Thu Sep 6 15:34:20 2018:899570:26908:MAST:MAST:DEBUG:sync.c:829sesid:913:readApiInterface
Thu Sep 6 15:34:20 2018:899614:26908:MAST:MAST:DEBUG:sync.c:811sesid:913 readApiInterface 3
Thu Sep 6 15:34:23 2018:635882:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:23 2018:636027:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:914: opcode getappkey
Thu Sep 6 15:34:23 2018:903331:26908:MAST:MAST:DEBUG:sync.c:829sesid:914:getappkey
Thu Sep 6 15:34:23 2018:903372:26908:MAST:MAST:DEBUG:sync.c:811sesid:914 getappkey 3
Thu Sep 6 15:34:24 2018:642114:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:24 2018:642254:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:915: opcode lic_substatus
Thu Sep 6 15:34:24 2018:908636:26908:MAST:MAST:DEBUG:sync.c:829sesid:915:lic_substatus
Thu Sep 6 15:34:24 2018:908705:26908:MAST:MAST:DEBUG:sync.c:811sesid:915 lic_substatus 3
Thu Sep 6 15:34:24 2018:927888:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
Thu Sep 6 15:34:24 2018:928042:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:916: opcode readApiInterface
Thu Sep 6 15:34:25 2018:538136:26908:MAST:MAST:DEBUG:sync.c:829sesid:916:readApiInterface
Thu Sep 6 15:34:25 2018:538223:26908:MAST:MAST:DEBUG:sync.c:811sesid:916 readApiInterface 3
EDIT AGAIN:
Tested with XG210 cluster -> no problems. No high cpu load.
Tested SFOS 17.0.8-209 performance indicator and load is still high, but not 100% cpu like 17.1.2MR2
Maybe something like a firmware update came with the latest MR2 and now slows down the appliances?
It really nice to see process GETHAINFO is so hungry in MR2 and in MR1 it's abot 30% less.
This is the result of a fresh installed XG125Rev3 with default config:
Now it gets really funny. this is the detailed load view of the newly installed machine:
EDIT:
Found the process, can't kill it because the monitor process is restarting it again and again.
After I killed the PPID is load is ~0-2
confirming same high cpu problem here on XG330 rev2 HA pairs after updating from 17.0.6 MR-6 to 17.1.2 MR-2. "gethainfo" and "worker" processes seem to use the most. based on some specific issues logged during the sfos updates in our environment that might not apply to anyone else's, sophos support suggested rolling back or checking results after disabling HA and recreating it as part of troubleshooting . 17.1.2 MR-2 fixes NC-30288 and NC-30520 for us so rolling back will be unwanted. I haven't had an opportunity to try HA recreation after hours yet so I'm interested in what others have accomplished.
Hi,
Same …
Valentin GUEGNARD
SOPHOS ING/ARCH
Hi onward
Apologies for this continued inconvenience, this high CPU issue caused by gethainfo process could be related to NC-36927. Have you already raised a support case for further investigation? If so, please PM me with your support case ID for follow up.
To our community,
If you are also experiencing similar symptoms, please raise a support case and PM me with your ID for follow up.
Regards,
Hi,
So, for my case, my engineer assigned to asked for help by GES SOPHOS. My case is 8343418.
@FLO, can you give us the details of the bug NC-36927
Valentin
Hi Valentin,
FloSupport said:this high CPU issue caused by gethainfo process could be related to NC-36927.
This is the available info for this ID, more investigation by our GES engineers would be required to confirm.