This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

High CPU-load with SFOS 17.1.2 MR-2 and HA

Hi,

since updating to SFOS 17.1.2 MR-2 a customers XG HA-cluster in A/P mode is ~90% cpu load.

Its a small XG125 setup. 

Anybody seen this before?:

 



This thread was automatically locked due to age.
  • do the logs provide any details?

  • Nothing, thats what makes me so helpless.

    I can also delete HA config, but the processes are still there.

    Reboot wont't help. Seems if got to go back to MR1.

     

     

     

    EDIT:

    After some investigation in seems like the Msyncd service causing this behavior.

    I never seen this before. 

     

    Log showing:

    Thu Sep 6 15:33:49 2018:519367:26908:MAST:MAST:DEBUG:sync.c:829sesid:901:readApiInterface
    Thu Sep 6 15:33:49 2018:519446:26908:MAST:MAST:DEBUG:sync.c:811sesid:901 readApiInterface 3
    Thu Sep 6 15:33:53 2018:626892:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:33:53 2018:627055:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:902: opcode getappkey
    Thu Sep 6 15:33:53 2018:895767:26908:MAST:MAST:DEBUG:sync.c:829sesid:902:getappkey
    Thu Sep 6 15:33:53 2018:895852:26908:MAST:MAST:DEBUG:sync.c:811sesid:902 getappkey 3
    Thu Sep 6 15:33:55 2018:572904:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:33:55 2018:573073:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:903: opcode lic_substatus
    Thu Sep 6 15:33:55 2018:834550:26864:MAST:MAST:DEBUG:worker.c:587 idle workers 10.
    Thu Sep 6 15:33:55 2018:834592:26864:MAST:MAST:DEBUG:worker.c:628 worker_num 14.
    Thu Sep 6 15:33:55 2018:841181:26908:MAST:MAST:DEBUG:sync.c:829sesid:903:lic_substatus
    Thu Sep 6 15:33:55 2018:841236:26908:MAST:MAST:DEBUG:sync.c:811sesid:903 lic_substatus 3
    Thu Sep 6 15:33:55 2018:867882:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:33:55 2018:868009:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:904: opcode readApiInterface
    Thu Sep 6 15:33:56 2018:478815:26908:MAST:MAST:DEBUG:sync.c:829sesid:904:readApiInterface
    Thu Sep 6 15:33:56 2018:478890:26908:MAST:MAST:DEBUG:sync.c:811sesid:904 readApiInterface 3
    Thu Sep 6 15:34:01 2018:015950:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:01 2018:194002:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :2
    Thu Sep 6 15:34:01 2018:194154:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:906: opcode garnerevent
    Thu Sep 6 15:34:01 2018:267592:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :3
    Thu Sep 6 15:34:01 2018:279379:26908:MAST:MAST:DEBUG:sync.c:829sesid:905:
    Thu Sep 6 15:34:01 2018:279438:26908:MAST:MAST:DEBUG:sync.c:811sesid:905 3
    Thu Sep 6 15:34:01 2018:457749:26908:MAST:MAST:DEBUG:sync.c:829sesid:906:garnerevent
    Thu Sep 6 15:34:01 2018:457814:26908:MAST:MAST:DEBUG:sync.c:811sesid:906 garnerevent 3
    Thu Sep 6 15:34:01 2018:530231:26908:MAST:MAST:DEBUG:sync.c:829sesid:907:
    Thu Sep 6 15:34:01 2018:530296:26908:MAST:MAST:DEBUG:sync.c:811sesid:907 3
    Thu Sep 6 15:34:07 2018:713782:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:07 2018:713929:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:908: opcode lic_substatus
    Thu Sep 6 15:34:07 2018:981884:26908:MAST:MAST:DEBUG:sync.c:829sesid:908:lic_substatus
    Thu Sep 6 15:34:07 2018:981959:26908:MAST:MAST:DEBUG:sync.c:811sesid:908 lic_substatus 3
    Thu Sep 6 15:34:08 2018:001467:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:08 2018:001626:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:909: opcode readApiInterface
    Thu Sep 6 15:34:08 2018:613194:26908:MAST:MAST:DEBUG:sync.c:829sesid:909:readApiInterface
    Thu Sep 6 15:34:08 2018:613264:26908:MAST:MAST:DEBUG:sync.c:811sesid:909 readApiInterface 3
    Thu Sep 6 15:34:13 2018:679027:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:13 2018:946962:26908:MAST:MAST:DEBUG:sync.c:829sesid:910:
    Thu Sep 6 15:34:13 2018:947032:26908:MAST:MAST:DEBUG:sync.c:811sesid:910 3
    Thu Sep 6 15:34:15 2018:874547:26864:MAST:MAST:DEBUG:worker.c:587 idle workers 10.
    Thu Sep 6 15:34:15 2018:874601:26864:MAST:MAST:DEBUG:worker.c:628 worker_num 14.
    Thu Sep 6 15:34:17 2018:002205:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:19 2018:705657:26908:MAST:MAST:DEBUG:sync.c:829sesid:911:
    Thu Sep 6 15:34:19 2018:705723:26908:MAST:MAST:DEBUG:sync.c:811sesid:911 3
    Thu Sep 6 15:34:19 2018:985850:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:19 2018:985982:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:912: opcode lic_substatus
    Thu Sep 6 15:34:20 2018:253087:26908:MAST:MAST:DEBUG:sync.c:829sesid:912:lic_substatus
    Thu Sep 6 15:34:20 2018:253165:26908:MAST:MAST:DEBUG:sync.c:811sesid:912 lic_substatus 3
    Thu Sep 6 15:34:20 2018:284612:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:20 2018:284731:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:913: opcode readApiInterface
    Thu Sep 6 15:34:20 2018:899570:26908:MAST:MAST:DEBUG:sync.c:829sesid:913:readApiInterface
    Thu Sep 6 15:34:20 2018:899614:26908:MAST:MAST:DEBUG:sync.c:811sesid:913 readApiInterface 3
    Thu Sep 6 15:34:23 2018:635882:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:23 2018:636027:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:914: opcode getappkey
    Thu Sep 6 15:34:23 2018:903331:26908:MAST:MAST:DEBUG:sync.c:829sesid:914:getappkey
    Thu Sep 6 15:34:23 2018:903372:26908:MAST:MAST:DEBUG:sync.c:811sesid:914 getappkey 3
    Thu Sep 6 15:34:24 2018:642114:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:24 2018:642254:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:915: opcode lic_substatus
    Thu Sep 6 15:34:24 2018:908636:26908:MAST:MAST:DEBUG:sync.c:829sesid:915:lic_substatus
    Thu Sep 6 15:34:24 2018:908705:26908:MAST:MAST:DEBUG:sync.c:811sesid:915 lic_substatus 3
    Thu Sep 6 15:34:24 2018:927888:26908:MAST:MAST:DEBUG:event.c:492 ses_cnt :1
    Thu Sep 6 15:34:24 2018:928042:26908:MAST:MAST:DEBUG:sync_entity.c:938sesid:916: opcode readApiInterface
    Thu Sep 6 15:34:25 2018:538136:26908:MAST:MAST:DEBUG:sync.c:829sesid:916:readApiInterface
    Thu Sep 6 15:34:25 2018:538223:26908:MAST:MAST:DEBUG:sync.c:811sesid:916 readApiInterface 3

     

     

     

    EDIT AGAIN:

    Tested with XG210 cluster -> no problems. No high cpu load.

    Tested SFOS 17.0.8-209 performance indicator and load is still high, but not 100% cpu like 17.1.2MR2

    Maybe something like a firmware update came with the latest MR2 and now slows down the appliances? 

    It really nice to see process GETHAINFO is so hungry in MR2 and in MR1 it's abot 30% less.

     

     

     

    This is the result of a fresh installed XG125Rev3 with default config:

     

    Now it gets really funny. this is the detailed load view of the newly installed machine:

     

     

     

    EDIT:

    Found the process, can't kill it because the monitor process is restarting it again and again.

    After I killed the PPID is load is ~0-2

  • confirming same high cpu problem here on XG330 rev2 HA pairs after updating from 17.0.6 MR-6 to 17.1.2 MR-2. "gethainfo" and "worker" processes seem to use the most.  based on some specific issues logged during the sfos updates in our environment that might not apply to anyone else's, sophos support suggested rolling back or checking results after disabling HA and recreating it as part of troubleshooting .  17.1.2 MR-2 fixes NC-30288 and NC-30520 for us so rolling back will be unwanted.  I haven't had an opportunity to try HA recreation after hours yet so I'm interested in what others have accomplished.

     

     

  • Same problem here with a ha cluster of 2 Sophos XG 125

  • Hey  

    Apologies for this inconvenience, have you already raised a support case to investigate this further with our team?

    If so, please PM me with your case ID so I can follow up.

    Thanks,

  • Hi,

     

    Same …

     

    Valentin GUEGNARD

    SOPHOS ING/ARCH

  • Hi  

    Apologies for this continued inconvenience, this high CPU issue caused by gethainfo process could be related to NC-36927. Have you already raised a support case for further investigation? If so, please PM me with your support case ID for follow up.

    To our community,

    If you are also experiencing similar symptoms, please raise a support case and PM me with your ID for follow up.

    Regards,

  • Hi,

    So, for my case, my engineer assigned to asked for help by GES SOPHOS. My case is 8343418.

    @FLO, can you give us the details of the bug NC-36927

     

    Valentin

  • Hi Valentin,

    FloSupport said:

    this high CPU issue caused by gethainfo process could be related to NC-36927.

    This is the available info for this ID, more investigation by our GES engineers would be required to confirm.

  • This issue occurs only if the RED feature is enabled and at least one RED interface is connected.

    The more RED interfaces are used, the heavier the issue occurs. Also happens on non-HA setups.

    It will be fixed in v17.1.MR3.