[9.194-5][BUG]Intel NIC crashes under load

This possibly a duplicate report.
I have been trying to do some performance reporting using two different sets of hardware built with the latest ISO.

One the NIC goes off line under load, kernel log shows a problem. BarryG advises there is an issue with the current driver.

I would provide the log entries, but that period of testing in now full of *, wrong UTM. Will need to bring the other UTM up to extract the log entries.

I changed the the NIC for another of the same type and model and got the same results. 

Ian


2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   TDH                  
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   TDT                  
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   next_to_use          
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   next_to_clean        
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] buffer_info[next_to_clean]:
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   time_stamp           
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   next_to_watch        
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   jiffies              
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536]   next_to_watch.status 
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] MAC Status             
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] PHY Status             
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] PHY 1000BASE-T Status  
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] PHY Extended Status    
2014:02:08-14:37:56 Cats-speed kernel: [ 2178.936536] PCI Status             
2014:02:08-14:37:57 Cats-speed kernel: [ 2179.944779] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
2014:02:08-14:38:00 Cats-speed kernel: [ 2183.235199] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx

Parents
  • Hi Barry,

    Thanks for the info. Did running traffic work for u after disabling the offloading parameters? 
    Best,
    Bianca

    @Ian, sorry but your posts are confusing. Please just do the steps mentioned in the thread and post the result (the purpose here is not to have the adapter crash). Note that disabling the TSO might lead to a lower throughput.
Reply
  • Hi Barry,

    Thanks for the info. Did running traffic work for u after disabling the offloading parameters? 
    Best,
    Bianca

    @Ian, sorry but your posts are confusing. Please just do the steps mentioned in the thread and post the result (the purpose here is not to have the adapter crash). Note that disabling the TSO might lead to a lower throughput.
Children
  • Hi Bianca,

    I ran the ethtool command to disable the offloading features...
    After about 15 mins of iperf testing (at 485mbps) on 9.194005, I got another NIC error, although it's different and shorter than before:

    2014:02:18-19:32:11 fw kernel: [ 3134.990189] e1000e 0000:01:00.0 eth1: Reset adapter unexpectedly
    2014:02:18-19:32:14 fw kernel: [ 3138.061307] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx


    Another identical one appeared while writing this message.


    Also, I am now seeing eth0 (i217-V) errors, even with light network load. I have NOT seen these before::

    2014:02:18-19:05:54 fw kernel: [ 1558.922609] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   TDH                  
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   TDT                  
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   next_to_use          
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   next_to_clean        
    2014:02:18-19:05:54 fw kernel: [ 1558.922609] buffer_info[next_to_clean]:
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   time_stamp           
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   next_to_watch        
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   jiffies              
    2014:02:18-19:05:54 fw kernel: [ 1558.922609]   next_to_watch.status 
    2014:02:18-19:05:54 fw kernel: [ 1558.922609] MAC Status             
    2014:02:18-19:05:54 fw kernel: [ 1558.922609] PHY Status             
    2014:02:18-19:05:54 fw kernel: [ 1558.922609] PHY 1000BASE-T Status  
    2014:02:18-19:05:54 fw kernel: [ 1558.922609] PHY Extended Status    
    2014:02:18-19:05:54 fw kernel: [ 1558.922609] PCI Status             
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   TDH                  
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   TDT                  
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   next_to_use          
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   next_to_clean        
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] buffer_info[next_to_clean]:
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   time_stamp           
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   next_to_watch        
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   jiffies              
    2014:02:18-19:05:56 fw kernel: [ 1560.921406]   next_to_watch.status 
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] MAC Status             
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] PHY Status             
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] PHY 1000BASE-T Status  
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] PHY Extended Status    
    2014:02:18-19:05:56 fw kernel: [ 1560.921406] PCI Status             
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   TDH                  
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   TDT                  
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   next_to_use          
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   next_to_clean        
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] buffer_info[next_to_clean]:
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   time_stamp           
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   next_to_watch        
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   jiffies              
    2014:02:18-19:05:58 fw kernel: [ 1562.920205]   next_to_watch.status 
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] MAC Status             
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] PHY Status             
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] PHY 1000BASE-T Status  
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] PHY Extended Status    
    2014:02:18-19:05:58 fw kernel: [ 1562.920205] PCI Status             
    2014:02:18-19:05:59 fw kernel: [ 1563.931252] ------------[ cut here ]------------
    2014:02:18-19:05:59 fw kernel: [ 1563.931267] WARNING: at net/sched/sch_generic.c:254 dev_watchdog+0xe7/0x182()
    2014:02:18-19:05:59 fw kernel: [ 1563.931267] Hardware name: Z87N-WIFI
    2014:02:18-19:05:59 fw kernel: [ 1563.931268] NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 timed out
    2014:02:18-19:05:59 fw kernel: [ 1563.931269] Modules linked in: sr_mod cdrom ipt_MASQUERADE xt_policy xt_hashlimit xt_connlabel xt_NFQUEUE xt_connmark xt_mark xt_limit xt_tcpudp xt_set xt_multiport xt_psd(O) xt_addrtype ip_set_hash_ip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_ftp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_ftp ip_set_hash_net ebtable_filter ebtables redv2_netlink nfnetlink_queue ip6table_ips ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_ips iptable_mangle iptable_nat nf_nat_ipv4 nf_nat xt_NFLOG xt_condition(O) xt_logmark xt_confirmed xt_owner af_packet ip6t_REJECT ipt_REJECT xt_state ip_set red2 ip_scheduler red nfnetlink_log nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6table_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack iptable_filter iptable_raw xt_CT nf_conntrack_netlink nfnetlink nf_conntrack ip6_tables ip_tables x_tables ipv6 loop mperf crc32c_intel aesni_intel ablk_helper cryptd lrw aes_x86_64 aes_generic xts gf128mul i2c_i801 coretemp evdev pcspkr rtc_cmos ehci_pci ehci_hcd e1000e(O) sg microcode button sd_mod xhci_hcd thermal fan processor thermal_sys hwmon edd ahci libahci libata scsi_mod hid_generic usbhid
    2014:02:18-19:05:59 fw kernel: [ 1563.931301] Pid: 0, comm: swapper/0 Tainted: G           O 3.8.13.15-106.g58c11e3-smp64 #1
    2014:02:18-19:05:59 fw kernel: [ 1563.931302] Call Trace:
    2014:02:18-19:05:59 fw kernel: [ 1563.931303]    [] ? dev_watchdog+0xe7/0x182
    2014:02:18-19:05:59 fw kernel: [ 1563.931306]  [] ? warn_slowpath_common+0x78/0x8d
    2014:02:18-19:05:59 fw kernel: [ 1563.931308]  [] ? netif_tx_lock+0x7e/0x7e
    2014:02:18-19:05:59 fw kernel: [ 1563.931309]  [] ? warn_slowpath_fmt+0x45/0x4a
    2014:02:18-19:05:59 fw kernel: [ 1563.931311]  [] ? netif_tx_lock+0x43/0x7e
    2014:02:18-19:05:59 fw kernel: [ 1563.931314]  [] ? dev_watchdog+0xe7/0x182
    2014:02:18-19:05:59 fw kernel: [ 1563.931315]  [] ? call_timer_fn+0x1b/0x6e
    2014:02:18-19:05:59 fw kernel: [ 1563.931316]  [] ? run_timer_softirq+0x16c/0x1b3
    2014:02:18-19:05:59 fw kernel: [ 1563.931319]  [] ? timekeeping_get_ns+0x12/0x35
    2014:02:18-19:05:59 fw kernel: [ 1563.931321]  [] ? __do_softirq+0x9d/0x15f
    2014:02:18-19:05:59 fw kernel: [ 1563.931323]  [] ? clockevents_program_event+0x9a/0xb9
    2014:02:18-19:05:59 fw kernel: [ 1563.931326]  [] ? disable_cpuidle+0xb/0xb
    2014:02:18-19:05:59 fw kernel: [ 1563.931327]  [] ? call_softirq+0x1c/0x30
    2014:02:18-19:05:59 fw kernel: [ 1563.931329]  [] ? do_softirq+0x3f/0x79
    2014:02:18-19:05:59 fw kernel: [ 1563.931331]  [] ? irq_exit+0x43/0xb1
    2014:02:18-19:05:59 fw kernel: [ 1563.931333]  [] ? smp_apic_timer_interrupt+0x85/0x93
    2014:02:18-19:05:59 fw kernel: [ 1563.931336]  [] ? apic_timer_interrupt+0x6d/0x80
    2014:02:18-19:05:59 fw kernel: [ 1563.931336]    [] ? __hrtimer_start_range_ns+0x271/0x284
    2014:02:18-19:05:59 fw kernel: [ 1563.931340]  [] ? cpuidle_wrap_enter+0x3c/0x71
    2014:02:18-19:05:59 fw kernel: [ 1563.931342]  [] ? cpuidle_wrap_enter+0x32/0x71
    2014:02:18-19:05:59 fw kernel: [ 1563.931343]  [] ? cpuidle_enter_state+0xa/0x33
    2014:02:18-19:05:59 fw kernel: [ 1563.931345]  [] ? cpuidle_idle_call+0x9e/0xcc
    2014:02:18-19:05:59 fw kernel: [ 1563.931346]  [] ? cpu_idle+0x61/0xa9
    2014:02:18-19:05:59 fw kernel: [ 1563.931348]  [] ? early_idt_handlers+0x120/0x120
    2014:02:18-19:05:59 fw kernel: [ 1563.931349]  [] ? start_kernel+0x372/0x37e
    2014:02:18-19:05:59 fw kernel: [ 1563.931350]  [] ? repair_env_string+0x5d/0x5d
    2014:02:18-19:05:59 fw kernel: [ 1563.931352]  [] ? x86_64_start_kernel+0x102/0x10f
    2014:02:18-19:05:59 fw kernel: [ 1563.931353] ---[ end trace 26e9c1b718e04c35 ]---
    2014:02:18-19:05:59 fw kernel: [ 1563.931356] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
    2014:02:18-19:06:03 fw kernel: [ 1568.250310] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx



    after the ethtool command:

    # /sbin/ethtool -k eth1
    Features for eth1:
    rx-checksumming: on
    tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
    scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
    tcp-segmentation-offload: off
    tx-tcp-segmentation: off
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: off
    udp-fragmentation-offload: off [fixed]
    generic-segmentation-offload: off
    generic-receive-offload: off
    large-receive-offload: off [fixed]
    rx-vlan-offload: on
    tx-vlan-offload: on
    ntuple-filters: off [fixed]
    receive-hashing: on
    highdma: on [fixed]
    rx-vlan-filter: on [fixed]
    vlan-challenged: off [fixed]
    tx-lockless: off [fixed]
    netns-local: off [fixed]
    tx-gso-robust: off [fixed]
    tx-fcoe-segmentation: off [fixed]
    fcoe-mtu: off [fixed]
    tx-nocache-copy: on
    loopback: off [fixed]
    rx-fcs: off
    rx-all: off


    eth0, i217-V:

    # /sbin/ethtool -k eth0
    Features for eth0:
    rx-checksumming: on
    tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
    scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: off [fixed]
    tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: off [fixed]
    tx-tcp6-segmentation: on
    udp-fragmentation-offload: off [fixed]
    generic-segmentation-offload: on
    generic-receive-offload: on
    large-receive-offload: off [fixed]
    rx-vlan-offload: on
    tx-vlan-offload: on
    ntuple-filters: off [fixed]
    receive-hashing: on
    highdma: on [fixed]
    rx-vlan-filter: off [fixed]
    vlan-challenged: off [fixed]
    tx-lockless: off [fixed]
    netns-local: off [fixed]
    tx-gso-robust: off [fixed]
    tx-fcoe-segmentation: off [fixed]
    fcoe-mtu: off [fixed]
    tx-nocache-copy: on
    loopback: off [fixed]
    rx-fcs: off
    rx-all: off



    # lspci -nn|grep Ether
    00:19.0 Ethernet controller [0200]: Intel Corporation Ethernet Connection I217-V [8086:153b] (rev 05)
    01:00.0 Ethernet controller [0200]: Intel Corporation 82572EI Gigabit Ethernet Controller (Copper) [8086:107d] (rev 06)
    (eth0 and eth1, respectively)

    Thanks,
    Barry