This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

XG HA: Kernel Panic on Auxiliary Appliance 18.0.1 MR-1-Build396 Tainted Module winbindd

Hello Community,

this is my first Post here. We updated our Cluster this Weekend to 18.0.1 MR-1-Build396.

After the update both Devices restarted. One took Master Role and other Aux all fine... for about a minute.

Auxilliary Device dropped to Faulty State. Upon check of the local console following came up:

BUG: unable to handle kernel NULL pointer dereference at
[  163.527682] IP:           (null)
[  163.537399] PGD 800000039b846067 P4D 800000039b846067 PUD 0
[  163.554381] Oops: 0010 [#1] SMP PTI
[  163.564855] Modules linked in: nf_conntrack_ipslb nfnetmap_queue(O) xt_master                                                                                                                                                                                                                                                                                                           YN_DATA ip6t_ADVERTISEMENT ip6t_SOLICITATION xt_LBS ip6table_filter iptable_filt                                                                                                                                                                                                                                                                                                           onntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp
[  163.777028]  nf_conntrack_pptp cfg80211 usbhid hid_generic hid ohci_pci ohci_                                                                                                                                                                                                                                                                                                           ort xfrm4_mode_tunnel xfrm4_tunnel xfrm_user af_key xfrm_algo aesni_intel glue_h                                                                                                                                                                                                                                                                                                           er ipt_rpfilter ebt_nflog ebt_pkttype xt_serviceset
[  163.988262]  xt_appset xt_hostset xt_pkttype xt_recent xt_state xt_status xt_                                                                                                                                                                                                                                                                                                           et ip_set_bitmap_fwrule ip_set_bitmap_ctrxss ip_set_bitmap_user sp2fp_api ip_set                                                                                                                                                                                                                                                                                                           ip6_udp_tunnel ptp pps_core mdio i2c_i801 i2c_dev i2c_core
[  164.201353]  netmap(O) ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw                                                                                                                                                                                                                                                                                                            es nfnetlink button evdev [last unloaded: nfnetmap_queue]
[  164.315505] CPU: 5 PID: 21751 Comm: winbindd Tainted: G           O    4.14.3
[  164.337970] Hardware name: Sophos XG/XG, BIOS 5.11 06/01/2018
[  164.355246] task: ffff8803b0257080 task.stack: ffffc90008304000
[  164.373031] RIP: 0010:          (null)
[  164.384308] RSP: 0000:ffff88046dd43e18 EFLAGS: 00010202
[  164.400016] RAX: ffffffffa083f700 RBX: ffff8803e5c0e780 RCX: ffff88044dde0400
[  164.421468] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8803e5c0e780
[  164.442919] RBP: ffff88044dde0410 R08: 0000000000000001 R09: 0000000000000001
[  164.464343] R10: 0000000000000000 R11: ffffc90008307bf0 R12: ffff8804546b2000
[  164.485767] R13: ffff8804546b2078 R14: ffff8804546b20a0 R15: 0000000000000008
[  164.507190] FS:  0000000000000000(0000) GS:ffff88046dd40000(0063) knlGS:00000
[  164.531487] CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
[  164.548737] CR2: 0000000000000000 CR3: 000000039b924003 CR4: 00000000001606e0
[  164.570164] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  164.591585] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  164.613049] Call Trace:
[  164.620439]  <IRQ>
[  164.626523]  ? ip_rcv+0x316/0x4c0
[  164.636509]  ? ip_local_deliver_finish+0x1d0/0x1d0
[  164.650941]  ? __netif_receive_skb_core+0x3ec/0xac0
[  164.665632]  ? enqueue_task_fair+0x320/0x440
[  164.678475]  ? process_backlog+0x86/0x120
[  164.690537]  ? process_backlog+0x86/0x120
[  164.702601]  ? net_rx_action+0xcc/0x270
[  164.714148]  ? __do_softirq+0xc5/0x1ec
[  164.725428]  ? do_softirq_own_stack+0x2a/0x40
[  164.738533]  </IRQ>
[  164.744901]  ? do_softirq.part.2+0x3c/0x40
[  164.757227]  ? netif_rx_ni+0x1d/0x30
[  164.767992]  ? dev_loopback_xmit+0xa3/0xc0
[  164.780317]  ? ip_mc_output+0x176/0x240
[  164.791860]  ? ip_finish_output2+0x3b0/0x3b0
[  164.804704]  ? ip_send_skb+0x10/0x40
[  164.815469]  ? udp_send_skb+0x94/0x240
[  164.826750]  ? udp_sendmsg+0x2f8/0x8c0
[  164.838037]  ? release_sock+0x3b/0x90
[  164.849059]  ? sock_sendmsg+0xe/0x20
[  164.859822]  ? SyS_sendto+0xad/0x150
[  164.870587]  ? ep_poll_wakeup_proc+0x20/0x20
[  164.883432]  ? compat_SyS_socketcall+0x12c/0x210
[  164.897319]  ? do_int80_syscall_32+0x58/0x110
[  164.910421]  ? entry_INT80_compat+0x48/0x50
[  164.923002] Code:  Bad RIP value.
[  164.933012] RIP:           (null) RSP: ffff88046dd43e18
[  164.948719] CR2: 0000000000000000
[  164.958727] ---[ end trace 0c3cc4f11b5d6136 ]---
[  164.958728] BUG: unable to handle kernel NULL pointer dereference at
[  164.958729] IP:           (null)
[  164.958729] PGD 80000003a7ba4067 P4D 80000003a7ba4067 PUD 0
[  164.958731] Oops: 0010 [#2] SMP PTI
[  164.958732] Modules linked in: nf_conntrack_ipslb nfnetmap_queue(O) xt_master                                                                                                                                                                                                                                                                                                           YN_DATA ip6t_ADVERTISEMENT ip6t_SOLICITATION xt_LBS ip6table_filter iptable_filt                                                                                                                                                                                                                                                                                                           onntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp
[  164.958744]  nf_conntrack_pptp cfg80211 usbhid hid_generic hid ohci_pci ohci_                                                                                                                                                                                                                                                                                                           ort xfrm4_mode_tunnel xfrm4_tunnel xfrm_user af_key xfrm_algo aesni_intel glue_h                                                                                                                                                                                                                                                                                                           er ipt_rpfilter ebt_nflog ebt_pkttype xt_serviceset
[  164.958758]  xt_appset xt_hostset xt_pkttype xt_recent xt_state xt_status xt_                                                                                                                                                                                                                                                                                                           et ip_set_bitmap_fwrule ip_set_bitmap_ctrxss ip_set_bitmap_user sp2fp_api ip_set                                                                                                                                                                                                                                                                                                           ip6_udp_tunnel ptp pps_core mdio i2c_i801 i2c_dev i2c_core
[  164.958771]  netmap(O) ip6table_nat nf_nat_ipv6 ip6table_mangle ip6table_raw                                                                                                                                                                                                                                                                                                            es nfnetlink button evdev [last unloaded: nfnetmap_queue]
[  164.958780] CPU: 6 PID: 21752 Comm: winbindd Tainted: G      D    O    4.14.3
[  164.958780] Hardware name: Sophos XG/XG, BIOS 5.11 06/01/2018
[  164.958780] task: ffff8803b0250000 task.stack: ffffc9000830c000
[  164.958781] RIP: 0010:          (null)
[  164.958781] RSP: 0000:ffff88046dd83e18 EFLAGS: 00010202
[  164.958782] RAX: ffffffffa083f700 RBX: ffff88039b9a03c0 RCX: ffff8803ae590200
[  164.958782] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88039b9a03c0
[  164.958782] RBP: ffff8803ae590210 R08: 0000000000000001 R09: 0000000000000001
[  164.958783] R10: 0000000000000000 R11: ffffc9000830fbf0 R12: ffff8804546b2000
[  164.958783] R13: ffff8804546b2078 R14: ffff8804546b20a0 R15: 0000000000000008
[  164.958784] FS:  0000000000000000(0000) GS

console> system ha show details
HA status : Enabled
Current Appliance Key : C430-------------
Peer Appliance Key : C430-----------
Current HA state : Standalone
Peer HA state : Fault
HA Config Mode : Active-Passive
Load Balancing : Not Applicable
Dedicated Port : Port4
Current Dedicated IP : 5.5.5.1
Peer Dedicated IP : 5.5.5.2
Monitoring Port :
Auxiliary Admin Port : bond1
Auxiliary Admin IP : 10.0.5.28
Auxiliary Admin IPv6 :
HA Cluster ID : 10
Keepalive request interval : 250
Keepalive attempts : 16
Hypervisor assigned MAC addresses : Disabled
HA preemption : Disabled

less /log/applog.log | grep ha:

Sep 29 13:07:27 ha_port_down_notification: message_id : log_data : Interface Port4 went down. Appliance HA state MASTSep 29 13:07:28 ha: handle_stat_change: 2:3 [ NA=0 AUX=1 STAND=2 PRIM=3 FAULT=4 READY=5 GOTO_PRIM=6 ]
Sep 29 13:07:28 ha: handle_stat_change: g_ha_hsc=1 is set.
Sep 29 13:07:28 ha: g_ha_transmode=0 [ CONFIG=1 INIT=2 EVENT=0 ]
Sep 29 13:07:28 ha: start tracking the device
Sep 29 13:07:28 ha: fwm:disablearpha successfully done
Sep 29 13:07:28 ha: msync:applyha: no network changes reqd
Sep 29 13:07:28 ha: fwm:applyha successfully done
Sep 29 13:07:28 ha: fwm:enablearpha successfully done
Sep 29 13:07:29 ha: mail sent successfully
Sep 29 13:07:30 ha: syncing conntracks
Sep 29 13:07:30 ha: handle_stat_change: 2:3 done.
Sep 29 13:07:30 ha: handle_stat_change: g_ha_hsc=0 is set.
Sep 29 13:08:07 ha: handle_stat_change: 3:2 [ NA=0 AUX=1 STAND=2 PRIM=3 FAULT=4 READY=5 GOTO_PRIM=6 ]
Sep 29 13:08:07 ha: handle_stat_change: g_ha_hsc=1 is set.
Sep 29 13:08:07 ha: g_ha_transmode=0 [ CONFIG=1 INIT=2 EVENT=0 ]
Sep 29 13:08:07 ha: start tracking the device
Sep 29 13:08:07 ha: fwm:disablearpha successfully done
Sep 29 13:08:07 ha: ctsyncd commited
Sep 29 13:08:07 ha: ctsyncd external cache flushed
Sep 29 13:08:07 ha: msync:applyha: prim->stand, so no network changes reqd
Sep 29 13:08:08 ha: fwm:applyha successfully done
Sep 29 13:08:10 ha: msync:garpha: send_arp 

<lots of arps here>
Sep 29 13:08:10 ha: fwm:enablearpha successfully done
Sep 29 13:08:11 ha: mail sent successfully
Sep 29 13:08:11 ha: syncing conntracks
Sep 29 13:08:11 ha: handle_stat_change: 3:2 done.
Sep 29 13:08:11 ha: handle_stat_change: g_ha_hsc=0 is set.
Sep 29 13:09:15 ha: appcached_ha_sync function is called...!!!!
Sep 29 13:13:20 ha: redis DB dump file sync is done !!

If someone managed to fixed that on his cluster any help would be appreciated.

Kind regards,

Sascha



This thread was automatically locked due to age.
Parents
  • Helo Sascha,

    Thank you for contacting the Sophos Community!

    I think the issue as you pointed out in the title is with winbindd Tainted: G

    Are you able to SSH to the AUX device or it is still in this failed status?

    In the master can you see if there is any coredump

    # cd /var/cores 

    # ls -lh

    Regards,

Reply
  • Helo Sascha,

    Thank you for contacting the Sophos Community!

    I think the issue as you pointed out in the title is with winbindd Tainted: G

    Are you able to SSH to the AUX device or it is still in this failed status?

    In the master can you see if there is any coredump

    # cd /var/cores 

    # ls -lh

    Regards,

Children
  • Hello Emmanuel,

    thank you for replying.

    Unfortunatly the Aux Device is totally frozen after the Kernel Panic. Not even the HW-Buttons or LCD responds to input.

    Only Chance to get Access for like 1 Minute is to cold boot it.

    Found some Coredumps on the Master:

    XG450_WP02_SFOS 18.0.1 MR-1-Build396# cd /var/cores
    XG450_WP02_SFOS 18.0.1 MR-1-Build396# ls -lh
    -rw-------    1 root     0          35.3M Jun 27 20:28 core.awed
    -rw-------    1 root     0          61.1M Sep 26 14:17 core.garner
    -rw-------    1 root     nasm       21.7M Sep 26 11:55 core.nasm
    -rw-------    1 root     0           2.2M Jan 27  2020 core.syncfile
    XG450_WP02_SFOS 18.0.1 MR-1-Build396#

    Makes sense to me, we actually had Problems with Logging and Wireless too. Syncfile makes sense too. Don´t know what nasm does.

    Regards,

    Sascha

  • Hi Emmanuel,

    any Idea what is going on?

    Kind regards,

    Sascha