Bug #1745
closedvarious Kernel panics with 2 identical NICs
0%
Description
We have two identical HP Servers (DL180 G6), same hardware and all, which originally came
with one 4-port NIC and one 2-port NIC, both Intel. Lately we upgraded both machines
by replacing the 2-port NICs by another 4-port NICs, same brand and model as the
other 4-port NICs already installed.
Since then, the machines keep crashing with either kernel trap 9 or 12.
Configuration did not change other than upgrading the hardware, and
updating the Firmware to the latest snapshot to see if the crashes disappear.
A crash usual occurs shortly after the both nodes have come online, which
makes me think it might be related to CARP VIPs. In case the machines
keep running for more than a few minutes, I switch CARP off and on again
repeatedly to trigger the issue. Usually after 1-3 cycles another crash occurs.
It's always one node crashing, the other keeps running - not always the
same node, though.
I attached a few screenshots of trap messages with different trap numbers
and "current process" lines, including the related backtrace (if available).
There's been more than those four occurrences, I just picked a random few to show.
The interface mentioned in "current process" sometimes varies, too, I remember
seeing em5 and em7, both on the newly added 4-port NIC. Sorry there's no backtrace
for the first trap.
Thought hardware info might be a good idea, too, here's the output of pciconf -l | grep em
(identical on both machines):
em0@pci0:11:0:0: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em1@pci0:11:0:1: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em2@pci0:10:0:0: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em3@pci0:10:0:1: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em4@pci0:7:0:0: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em5@pci0:7:0:1: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em6@pci0:6:0:0: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00 em7@pci0:6:0:1: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00
em0-3 is the NIC that's been there since always, em4-7 is the newly installed.
This issue is very severe to us, as it keeps us from using the machines at all.
Files
Updated by Jim Pingle over 13 years ago
- Status changed from New to Rejected
Please post on the forum for help diagnosing such issues, since often there are configuration tweaks that can mitigate such panics. They aren't necessarily anything we can do about at the driver level.
See also http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards - especially the part about em/igb queues.
Updated by Andreas Bochem over 13 years ago
Sorry for posting in the wrong place. And thanks for providing the pointer to more info!