Hello all,
I am having trouble with our HA/Cluster-Interface on our Dell PowerEdge R420 with two Broadcom NetXtreme BCM5720 onboard and an additional Intel I350 4P Card, since we have updated from 9.106-17 to Version 9.111-007.
We use one port of the Intelcard as the HA/Cluster Interface with all "automatic" settings.
Both machines are identical to each other.
What we have done so far:
1. replaced the cross-link cable
2. reinstalled UTM 9.111-007 from ISO directly without updating from previous versions
3. replaced the Intel network card on one machine which gave IO Error with mii-diag -s eth4
4. made firmware updates of the network card on both machines from 14.5.9 to 15.0.28
4. replaced the cable again
5. reinstalled the other machine with the iso
6. hard set the speed of the interface in the bios
The results are always the same: after a few minutes ( 45-90 minutes)
The dashboard shows:
Interface: eth4 Name: HA/Cluster Type: Ethernet Status: On Link[:D]own
ethttool eth4: established no / link speed unknown (on both machines)
mii-diag -s eth4: Link not established OR SIOCGMIIREG on eth4 failed: Input/output error
Sometimes this error occurs on node 1, sometimes on node 2 but NEVER on both nodes at the same time.
Also this error only occurs at the x-linked Interface/port of the Intelcard and not on the other interfaces which always have traffic.
lsmod shows that the modules igb and tg3 are loaded and the driver version of the Intel card is 5.0.6
I have no idea what happened, but before the update all things worked fine.
I have search for other topics with this problem and found out, that there was an Intel Network Driver udpate in one of the versions, we did not apply before. Also I found a Mantis ID #30669 at https://community.sophos.com/products/unified-threat-management/astaroorg/f/81/t/65555 which sound similar to our problem, but I have no idea where I can get this patch to try out.
Did any one of you have some hints for me? I could post millions of logs, but none of them seemed plausible to me. The only thing I can see in the system logs of one of the nodes is, that the auto-negotiation switches from 1000 Mbps to 100 to 10 to 100 to 10 to down. The other node just notice "down".
Please help, because this cluster is not in production at the moment but have to be in one or two weeks, so we need a working failover.
Kind regards
This thread was automatically locked due to age.