Project

General

Profile

Actions

Bug #1745

closed

various Kernel panics with 2 identical NICs

Added by Andreas Bochem almost 13 years ago. Updated over 12 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
08/04/2011
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
Affected Architecture:

Description

We have two identical HP Servers (DL180 G6), same hardware and all, which originally came
with one 4-port NIC and one 2-port NIC, both Intel. Lately we upgraded both machines
by replacing the 2-port NICs by another 4-port NICs, same brand and model as the
other 4-port NICs already installed.

Since then, the machines keep crashing with either kernel trap 9 or 12.
Configuration did not change other than upgrading the hardware, and
updating the Firmware to the latest snapshot to see if the crashes disappear.

A crash usual occurs shortly after the both nodes have come online, which
makes me think it might be related to CARP VIPs. In case the machines
keep running for more than a few minutes, I switch CARP off and on again
repeatedly to trigger the issue. Usually after 1-3 cycles another crash occurs.

It's always one node crashing, the other keeps running - not always the
same node, though.

I attached a few screenshots of trap messages with different trap numbers
and "current process" lines, including the related backtrace (if available).
There's been more than those four occurrences, I just picked a random few to show.
The interface mentioned in "current process" sometimes varies, too, I remember
seeing em5 and em7, both on the newly added 4-port NIC. Sorry there's no backtrace
for the first trap.

Thought hardware info might be a good idea, too, here's the output of pciconf -l | grep em
(identical on both machines):

em0@pci0:11:0:0:        class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                               
em1@pci0:11:0:1:        class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                               
em2@pci0:10:0:0:        class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                               
em3@pci0:10:0:1:        class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                               
em4@pci0:7:0:0: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                                       
em5@pci0:7:0:1: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                                       
em6@pci0:6:0:0: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00                                                                                       
em7@pci0:6:0:1: class=0x020000 card=0x704b103c chip=0x10bc8086 rev=0x06 hdr=0x00  

em0-3 is the NIC that's been there since always, em4-7 is the newly installed.

This issue is very severe to us, as it keeps us from using the machines at all.


Files

01_pfsense2-trap12.jpg (112 KB) 01_pfsense2-trap12.jpg Andreas Bochem, 08/04/2011 08:14 AM
02_pfsense1-trap9.png (94.8 KB) 02_pfsense1-trap9.png Andreas Bochem, 08/04/2011 08:14 AM
03_pfsense2-trap9.png (93.2 KB) 03_pfsense2-trap9.png Andreas Bochem, 08/04/2011 08:14 AM
04_pfsense2-trap12.png (96.7 KB) 04_pfsense2-trap12.png Andreas Bochem, 08/04/2011 08:14 AM
Actions #1

Updated by Jim Pingle almost 13 years ago

  • Status changed from New to Rejected

Please post on the forum for help diagnosing such issues, since often there are configuration tweaks that can mitigate such panics. They aren't necessarily anything we can do about at the driver level.

See also http://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards - especially the part about em/igb queues.

Actions #2

Updated by Andreas Bochem over 12 years ago

Sorry for posting in the wrong place. And thanks for providing the pointer to more info!

Actions

Also available in: Atom PDF