Guest User!

You are not Sophos Staff.

[9.171][BUG] Intel NIC hang

Hi,

I was doing some local LAN network load testing with Iperf on my new hardware, and Iperf suddenly got a Network Error.

I looked in the UTM's kernel log and found a corresponding error there for eth1.

eth1 is an Intel PCIe card, with Intel 82572EI chip, in a PCIe x16 slot.

Motherboard is a Gigabyte GA-Z87N-WiFi with Intel Z87 chip.

Iperf was running on separate machines, not on the firewall.

IPS and ATP were running, all other features were disabled.

2013:11:28-19:39:07 fw kernel: [ 2245.063734] e1000e 0000:01:00.0 eth1: Detected Hardware Unit Hang:
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   TDH                  
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   TDT                  
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_use          
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_clean        
2013:11:28-19:39:07 fw kernel: [ 2245.063734] buffer_info[next_to_clean]:
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   time_stamp           
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_watch        
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   jiffies              
2013:11:28-19:39:07 fw kernel: [ 2245.063734]   next_to_watch.status 
2013:11:28-19:39:07 fw kernel: [ 2245.063734] MAC Status             
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PHY Status             
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PHY 1000BASE-T Status  
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PHY Extended Status    
2013:11:28-19:39:07 fw kernel: [ 2245.063734] PCI Status             
2013:11:28-19:39:11 fw kernel: [ 2249.059583] e1000e 0000:01:00.0 eth1: Detected Hardware Unit Hang:
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   TDH                  
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   TDT                  
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_use          
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_clean        
2013:11:28-19:39:11 fw kernel: [ 2249.059583] buffer_info[next_to_clean]:
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   time_stamp           
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_watch        
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   jiffies              
2013:11:28-19:39:11 fw kernel: [ 2249.059583]   next_to_watch.status 
2013:11:28-19:39:11 fw kernel: [ 2249.059583] MAC Status             
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PHY Status             
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PHY 1000BASE-T Status  
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PHY Extended Status    
2013:11:28-19:39:11 fw kernel: [ 2249.059583] PCI Status             
2013:11:28-19:39:13 fw kernel: [ 2251.068899] ------------[ cut here ]------------
2013:11:28-19:39:13 fw kernel: [ 2251.068904] WARNING: at net/sched/sch_generic.c:254 dev_watchdog+0xe7/0x182()
2013:11:28-19:39:13 fw kernel: [ 2251.068905] Hardware name: Z87N-WIFI
2013:11:28-19:39:13 fw kernel: [ 2251.068906] NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out
2013:11:28-19:39:13 fw kernel: [ 2251.068906] Modules linked in: sr_mod cdrom ipt_MASQUERADE xt_policy xt_hashlimit xt_connlabel xt_NFQUEUE xt_connmark xt_mark xt_tcpudp xt_set xt_multiport xt_addrtype ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_ftp ip_set_hash_net nfnetlink_queue ebtable_filter ebtables redv2_netlink af_packet ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG xt_condition(O) xt_logmark xt_confirmed xt_owner ip6t_REJECT ipt_REJECT xt_state ip_set red2 ip_scheduler red nfnetlink_log nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables x_tables ipv6 loop acpi_cpufreq mperf crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 aes_generic xts gf128mul coretemp i2c_i801 pcspkr e1000e(O) microcode evdev rtc_cmos ehci_pci ehci_hcd button sg sd_mod xhci_hcd thermal fan processor thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
2013:11:28-19:39:13 fw kernel: [ 2251.068938] Pid: 0, comm: swapper/0 Tainted: G           O 3.8.13.6-18.g9aea9e6-smp64 #1
2013:11:28-19:39:13 fw kernel: [ 2251.068939] Call Trace:
2013:11:28-19:39:13 fw kernel: [ 2251.068939]    [] ? dev_watchdog+0xe7/0x182
2013:11:28-19:39:13 fw kernel: [ 2251.068943]  [] ? warn_slowpath_common+0x78/0x8d
2013:11:28-19:39:13 fw kernel: [ 2251.068945]  [] ? netif_tx_lock+0x7e/0x7e
2013:11:28-19:39:13 fw kernel: [ 2251.068946]  [] ? warn_slowpath_fmt+0x45/0x4a
2013:11:28-19:39:13 fw kernel: [ 2251.068948]  [] ? netif_tx_lock+0x43/0x7e
2013:11:28-19:39:13 fw kernel: [ 2251.068950]  [] ? dev_watchdog+0xe7/0x182
2013:11:28-19:39:13 fw kernel: [ 2251.068952]  [] ? call_timer_fn+0x1b/0x6e
2013:11:28-19:39:13 fw kernel: [ 2251.068953]  [] ? run_timer_softirq+0x16c/0x1b3
2013:11:28-19:39:13 fw kernel: [ 2251.068955]  [] ? __do_softirq+0x9d/0x15f
2013:11:28-19:39:13 fw kernel: [ 2251.068959]  [] ? disable_cpuidle+0xb/0xb
2013:11:28-19:39:13 fw kernel: [ 2251.068960]  [] ? call_softirq+0x1c/0x30
2013:11:28-19:39:13 fw kernel: [ 2251.068962]  [] ? do_softirq+0x3f/0x79
2013:11:28-19:39:13 fw kernel: [ 2251.068965]  [] ? irq_exit+0x43/0xb1
2013:11:28-19:39:13 fw kernel: [ 2251.068966]  [] ? disable_cpuidle+0xb/0xb
2013:11:28-19:39:13 fw kernel: [ 2251.068967]  [] ? reschedule_interrupt+0x6d/0x80
2013:11:28-19:39:13 fw kernel: [ 2251.068968]    [] ? __hrtimer_start_range_ns+0x271/0x284
2013:11:28-19:39:13 fw kernel: [ 2251.068972]  [] ? cpuidle_wrap_enter+0x3c/0x71
2013:11:28-19:39:13 fw kernel: [ 2251.068973]  [] ? cpuidle_wrap_enter+0x32/0x71
2013:11:28-19:39:13 fw kernel: [ 2251.068975]  [] ? cpuidle_enter_state+0xa/0x33
2013:11:28-19:39:13 fw kernel: [ 2251.068976]  [] ? cpuidle_idle_call+0x9e/0xcc
2013:11:28-19:39:13 fw kernel: [ 2251.068977]  [] ? cpu_idle+0x61/0xa9
2013:11:28-19:39:13 fw kernel: [ 2251.068979]  [] ? early_idt_handlers+0x120/0x120
2013:11:28-19:39:13 fw kernel: [ 2251.068980]  [] ? start_kernel+0x372/0x37e
2013:11:28-19:39:13 fw kernel: [ 2251.068982]  [] ? repair_env_string+0x5d/0x5d
2013:11:28-19:39:13 fw kernel: [ 2251.068983]  [] ? x86_64_start_kernel+0x102/0x10f
2013:11:28-19:39:13 fw kernel: [ 2251.068984] ---[ end trace 918e7f6542337fab ]---
2013:11:28-19:39:13 fw kernel: [ 2251.068987] e1000e 0000:01:00.0 eth1: Reset adapter unexpectedly
2013:11:28-19:39:16 fw kernel: [ 2254.158485] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


I also checked the logs on the Fedora Iperf client; there were no hardware or driver errors.

Please let me know if any other information is needed.

Barry
Parents Reply Children
  • Hi, does the 9.180 update have this kernel?

    Hi Barry,
    First of all many thanks for testing.
    The Kernel used in 9.180 is kernel-smp-3.8.13.6-18.g9aea9e6.

    We created an ID for kernel update to 3.8.13.13 and update Intel drivers to latest stable, it should fix the crash occurred with your type of NIC. This is targeted for release 9.190.
    Please feel free to update to 9.180. [:D]

    Best regards, 
    Bianca
  • Hi Barry,
    First of all many thanks for testing.
    The Kernel used in 9.180 is kernel-smp-3.8.13.6-18.g9aea9e6.
    ...
    Please feel free to update to 9.180. [:D]


    Hi Bianca,

    I've upgraded to 9.180, but the custom kernel from HolgerE is still active for some reason.


    loginuser@fw:/var/log > version

    Current software version...: 9.180021
    Hardware type..............: Software Appliance
    Installation image.........: 9.165-15.1
    Installation type..........: asg
    Installed pattern version..: 54366
    Downloaded pattern version.: 54366
    Up2Dates applied...........: 2 (see below)
                                 sys-9.165-9.171-15.2.1.tgz (Nov 28 18:53)
                                 sys-9.171-9.180-2.21.2.tgz (Dec  4 13:28)
    Up2Dates available.........: 0
    Factory resets.............: 0
    Timewarps detected.........: 1

    loginuser@fw:/var/log > uname -a
    Linux fw.x.net 3.8.13.6-4.g9916619-smp64 #1 SMP Wed Oct 9 16:18:24 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

    loginuser@fw:/var/log > rpm -qa|grep kernel
    kernel-firmware-20131105-1.0.149074468.g67d87ca
    kernel-smp64-3.8.13.6-18.g9aea9e6
    kernel-smp64-3.8.13.6-4.g9916619


    I would have expected the .6-18 kernel to be considered newer than the .6-4.

    Anyways, I'm fixing it by changing the default entry in GRUB.

    Thanks,
    Barry