Project

General

Profile

Actions

Bug #1425

closed

pfSense stops receiving traffic on 'bge' driven interface

Added by Chris Smith over 13 years ago. Updated over 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Operating System
Target version:
-
Start date:
04/07/2011
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.0
Affected Architecture:

Description

Hi guys,

This bug has happened to our installation twice, now. Seemingly randomly, the bge0 interface (I have bge0 and 1) will lose the ability to receive traffic via it's interface, proven by tcpdump.

bge0 is defined as this in the dmesg:

bge0: <HP NC107i PCIe Gigabit Server Adapter, ASIC rev. 0x5784100> mem 0xdf600000-0xdf60ffff irq 16 at device 0.0 on pci32

This bug was previously reported to the FreeBSD team who believe it is fixed in stable/8 (after 8.1-RELEASE, which is what pfSense 2.0 is using) and the bug submitter confirms. The fix, whatever it is, is also present in 7.3-RELEASE.

The FreeBSD bug is here:
http://www.freebsd.org/cgi/query-pr.cgi?pr=152295

This is obviously a show-stopper (for us at least, it's an increasingly popular chipset found in servers) as there is no work around until it fails.

Will pfSense 2.0 (release) be based on 8.2? Or 8.1-release and therefore not have the bugfix for this chipset driver installed? Is a backport of the specific fix possible?

We're currently running pfSense2-BETA4 and have no plans to upgrade before the official release. However, if the fix is included in a release candidate we will be happy to upgrade (after some testing) :)

Any help appreciated,
Cheers,
Chris.

Actions #1

Updated by Chris Buechler over 13 years ago

  • Category set to Operating System
  • Target version set to 2.0
  • Affected Version set to 2.0

It will be 8.1. Will see if we can easily back port the 8.2 driver if that is the fix.

Actions #2

Updated by Chris Smith over 13 years ago

The FreeBSD bug indicates that the bug is triggered by high traffic/bandwidth via the interface.

If someone can recommend to me a way of generating large amounts of traffic between two systems so I can reproduce the bug at-will, I would be happy to confirm and test any bugfixes that you might come up with using my secondary slave pfSense system.

Actions #3

Updated by Slaygon Censor over 13 years ago

We are seeing this error aswell. We can safely push some 200-300 mbit of traffic, but going beyond that will stop traffic on the interface.
Any update regarding the back port from the 8.2 driver?

Actions #4

Updated by Evgeny Yurchenko over 13 years ago

Multiple instances of iperf with udp traffic is very good way to generate substantial load.

Actions #5

Updated by Ermal Luçi over 13 years ago

Can you please try by disabling msix and tso on bge interfaces?

Actions #6

Updated by Ermal Luçi over 13 years ago

  • Target version deleted (2.0)

I am removing dependency on 2.0 since this a driver/hw issue and out of our control.

Actions #7

Updated by Chris Smith over 13 years ago

I have had TSO disabled since two weeks ago and have not experienced any crashes, but the systems could go literally months before exhibiting the problem. I'm not sure about how to disable msix or even what it is.

We've got an identical system on the way for a new installation and I was planning to do some testing with generating network traffic to try and reproduce the problem, which I won't bother to do if you plan to give up on this bug.

As mentioned earlier in the ticket, the bug has already been fixed in FreeBSD, just not in the version that PFSense 2.0. We just need someone with some experience and knowledge to help us out with cherry-picking the fix back to the version PFSense is built on to close this bug and the problem users are experiencing on production hardware.

Actions #8

Updated by Ermal Luçi over 13 years ago

The simplest way to do that is getting a FreeBSD 8.1 box and build the latest bge drivers from FreeBSD HEAD.
Then load the module at boot. It should use that.

The moving of this ticket is not that we give up but with msix disabled and TSO as well disabled systems have behaved so no need to keep this for 2.0

Actions #9

Updated by Chris Smith over 13 years ago

Okay, fair enough. Can you please help me with the process of disabling 'msix'? I'm finding it difficult to discover any resources on it.

Actions #10

Updated by Ermal Luçi over 13 years ago

hw.pci.enable_msix

Actions #12

Updated by Chris Smith over 13 years ago

Disabling msix on our system caused serious problems (we also have igb Intel Pro cards in these systems which I suspect require this setting).

Symptoms included high CPU load and dropped/hanging connections so this is something that we're not going to be doing on our systems.

Since our broadcom interfaces are driven by the bge driver (and not bce), the setting suggested from the link given by Jim P above does not work. There is no tunable with the name: "hw.bge.tso_enable" so we were unable to disable TSO using this method. I instead used "ifconfig bceX -tso" which I assume does the same thing. We experienced a failure as described in this bug with TSO disabled.

As it appears that this bug is not able to be fixed before the next stable release, and the workarounds either not applicable or with side effects that make them unusable we will simply have to stop using these (onboard) interfaces on these systems and replace them with something else. Luckily our systems have an extra expansion slot which we will install identical Intel igb cards as we already have installed and disable the Broadcom onboard interfaces completely.

I suggest that if others are having issues with these interfaces in high traffic systems that you do the same, especially as this bug occurring will often not trigger fail-over in paired systems and if it does could break network communication on both hosts as it did to us last week.

Actions #13

Updated by Chris Smith over 13 years ago

Also, just as an aside, I was unable to trigger the issue using generated traffic from iperf. I tried to generate both UDP and TCP traffic from single and multiple systems, saturating the 1Gb network connection for up to 30 minutes without experiencing failure. If anybody has any further advice on this I'd be happy to try it.

Actions #14

Updated by Jim Pingle over 12 years ago

  • Status changed from New to Feedback

Marking Feedback on this for now, should be tried again on 8.3-based snapshots.

Aside from that, there are several customers using bge adapters without many issues, though they do generally need to disable misx/tso and set higher nmbclusters.

Actions #15

Updated by Chris Buechler over 12 years ago

  • Status changed from Feedback to Closed

fixed upstream

Actions

Also available in: Atom PDF