Project

General

Profile

Bug #1221

igb driver mbuf allocation problems on multicore machines aka Could not setup receive structures

Added by R M over 7 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Interfaces
Target version:
Start date:
01/21/2011
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.0
Affected Architecture:
All

Description

I'm creating this ticket in relation to the following forum topic since I don't think an bug was submitted by the OP:
http://forum.pfsense.org/index.php/topic,30108.msg156719.html

The issue stems from the igb driver autoconfiguring the number of queues based on the number of cpu cores. On systems with many cores, the queues end up outstripping the mbuf resource pool. On processors with Hyper Threading, the allocation doubles. See http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-10/msg00267.html

A temporary workaround was to neuter the system by using a uniprocessor kernel, disabling Hyper Threading, and maxing out the mbuf size with settings defined in /boot/loader.conf

In light of bug #560 (http://redmine.pfsense.org/issues/560) values set in /boot/loader.conf get wiped by updates, leaving a multicore system with igb interfaces completely inaccessible over the network.

History

#1 Updated by R M over 7 years ago

Apologies. Doesn't look like I set the target version of the bug correctly which means it doesn't show up in the custom queries for high or new issues.

#2 Updated by Chris Buechler over 7 years ago

  • Target version set to 2.0

#3 Updated by Jim Pingle over 7 years ago

bug #560 isn't really relevant to this, you should store your personal customizations in loader.conf.local - that file should be left alone by the upgrade process.

#4 Updated by R M over 7 years ago

Thanks for the response Jim.

Since there's no man pages available in pfSense, my reference regarding the matter was the following from an existing FreeBSD 8.1-RELEASE machine:

LOADER.CONF (5)
---- SNIP ----
/boot/loader.conf user defined settings.
/boot/loader.conf.local machine-specific settings for sites with a common loader.conf.
---- /SNIP ----

Consider that my pfSense install is a standalone machine, and thus not using a common loader.conf used on any other machines. By the reference provided from man, I have placed my user defined settings in what was understood to be the correct place.

On a stock pfSense install:
  • No comments in the header of loader.conf that suggests the use of loader.conf.local over loader.conf
  • No empty loader.conf.local with comments that suggests that it should be used over loader.conf

Comparing loader.4th, support.4th, boot* etc to a stock FreeBSD 8.1-RELEASE, diffs provide no noticeable differences between the pfSense and stock versions.

What is different however is that pfSense writes it's own values to loader.conf without checking or sucking existing values back in that may exist in loader.conf, which is an entirely different problem from people using loader.conf as it has been intended.

On the face of it I can understand why it is easy to dismiss the reported issue as being unrelated to #560, but consider that for those like me who looked up kern.ipc.nmbclusters and machdep.hyperthreading_allowed or any settings required at boot time are going to find their reference material from people who have most likely dealt with these settings safely on stock FreeBSD systems will repeat my mistake by using loader.conf on pfSense and end up being gutted by #560.

There's only 2 references for the use of loader.conf.local in combination with kern.ipc.nmbclusters on the forum and the other reference noted as the location as a modification required for squid package tuning and where only found by explicitly searching for loader.conf.local and kern.ipc.nmbclusters, and did not turn up in the top results for just loader.conf and kern.ipc.nmbclusters.

References from http://doc.pfsense.org/index.php/Booting_Options indicate the use of loader.conf, but it's not until http://doc.pfsense.org/index.php/Boot_Troubleshooting that loader.conf.local is indicated to be used, but without explanation as to why one may want do that.

For all intensive purposes, it may be safer for pfSense to leave loader.conf alone, and in turn write it's machine-specific settings to loader.conf.local instead of breaking existing conventions, or even better, have a pfSense specific conf that can get sucked in trough updated loader and support scripts that don't interfere with loader.conf and loader.conf.local intended use.

Another consideration may be to have a new section under advanced that let's people provide additional boot options required and have them written out to loader.conf along with the pfSense specific values so that it's not necessary to introduce those changes with manual edits.

#5 Updated by Jim Pingle over 7 years ago

All that has been discussed at length (and not on an unrelated ticket), it was much easier to leave loader.conf.local for users to use. It's been mentioned in several areas, but perhaps the docs need updating. If you want to discuss it further, use the other ticket. It's still not related to this one.

#6 Updated by Chris Buechler over 7 years ago

  • Priority changed from High to Normal

#7 Updated by Ermal Luçi over 7 years ago

  • Status changed from New to Feedback

This is a tunable that can be recommended as a workaround or ship by default!
hw.igb.num_queues

#8 Updated by R M over 7 years ago

Been testing hw.igb.num_queues="4" for the last week and so far it seems to be working with no problems so far with HT disabled on uniprocessor kernel.

/boot/loader.conf.local
kern.ipc.nmbclusters="655356"
machdep.hyperthreading_allowed="0"
hw.igb.num_queues="4"

Also tested briefly with HT turned on and saw no problems either.

hw.igb.num_queues="2" may be more than reasonable as well, but I have not had an opportunity to test that yet.

#9 Updated by Chris Buechler over 7 years ago

  • Status changed from Feedback to Resolved

#10 Updated by Kai Poggensee over 4 years ago

Please re-open this issue, it is by no means resolved.
We have a system here that exposes the very same problem.

System details:
Barebone Superserver 5018A-FTN4 19"
feat. a Intel® Atom processor C2758, SoC (no HyperTreading! [1])
and 32GB of RAM
6x Ethernet

Running:
PFsense 2.1.3

and we see the infamous
"igb[N]: Could not setup receive structures"
directly when booting.

On a sidenote, it could be there is a typo in the above comment.
It states
kern.ipc.nmbclusters="655356"
but should probably read
kern.ipc.nmbclusters="65536"
no?

As this problem is still existing in PFsense, I would suggest implementing
a workaround for next release. Why not increase the nmbclusters default?
Wouldn't hurt for a system that is firewall/router/gateway etc. anyway, no?

we have implemented only the following two parameters in /boot/loader.conf.local:

kern.ipc.nmbclusters="65536" 
hw.igb.num_queues="2" 

[1]
http://ark.intel.com/de/products/77987/Intel-Atom-Processor-C2750-4M-Cache-2_40-GHz

#11 Updated by Chris Buechler over 4 years ago

the original issue of this ticket is resolved. Any specific combination of hardware may require tuning outside the defaults.

#12 Updated by Denis Kozlov over 3 years ago

This is not resolved!

A fresh install of latest pfSense 2.2.1 (FreeBSD 10.1) on hardware with 8 CPU cores, 8 GB RAM and 4 Gigabit ports reaches MBUF limit immediately upon first boot, making the device inaccessible right from the start. MBUF usage is 100% (26584/26584).

Default interfaces not found -- Running interface assignment option.
[zone: mbuf_cluster] kern.ipc.nmbclusters limit reached
igb3: Could not setup receive structures
igb3: Could not setup receive structures
Valid interfaces are...

I'm aware that manual tweaking of kernel parameters "kern.ipc.nmbclusters" and "hw.igb.num_queues" works around the issue - but this is in no way a resolution. There was a hope that the praised MBUFs auto-scaling in FreeBSD 10.x would solve this issue, but it doesn't seem to do anything at all to improve it, at least not for the type of hardware in question.

This brings a question... Why can't this be addressed in pfSense? Why not automatically (or at least suggestively offer) to scale MBUF limits according to the hardware? It should be simple enough to get number of available CPU cores and number of network interfaces and use these variables to scale MBUF limits accordingly?

The goal here is to make pfSense usable right from the start on non-primitive hardware.

--
System: SuperServer 5018A-FTN4
Motherboard: A1SRi-2758F
Processor: Intel Atom C2758, 8 Cores
Network interfaces: 4 x Intel Gigabit
Memory: 8 GB

#13 Updated by Chris Buechler over 3 years ago

you need to follow the guidance here:
https://doc.pfsense.org/index.php/Tuning_and_Troubleshooting_Network_Cards

as appropriate for your hardware. Or buy from store.pfsense.org, where we set all that appropriately for the combination of hardware you're purchasing. This issue is as resolved as it can be.

#14 Updated by Denis Kozlov over 3 years ago

The same problem occurs even when purchased from pfStore Store. A clean (re)install of pfSense is unreachable.

I don't understand why this can't be addressed in pfSense?

It will save hours (otherwise wasted debugging and researching) for users who install pfSense on anything with few CPU cores and few NICs. This is not a simple tunable parameter that improves performance, this is a bug that prevents pfSense from operating on clean install.

Going forward, as networking demands keep increasing, it will only be getting worse unless addressed.

#15 Updated by Chris Buechler over 3 years ago

if you need to reinstall something purchased from us, you need to get in touch with us to get the proper image to reinstall that system.

#16 Updated by Denis Kozlov over 3 years ago

Why can't this be addressed in pfSense?

#17 Updated by Emanuel Somosan about 3 years ago

Still on 2.2.3 this bug is for sure not resolved. Yes there is a manual workaround that needs to be applied on every fresh install, but this is ridiculous. Is it ment that way, that professional hardware is not supported by the "comunity version", as fixes for these kind of bugs are reserved to the "non comunity version" only?

#18 Updated by Chris Buechler about 3 years ago

The original problem here from years back has nothing to do with anything current, that was a 4 year old driver problem that was resolved 4 years ago. With some combinations of hardware, where you don't buy it from us (so we don't know what it is), you may need to tune things accordingly if you hit the message in the subject. That has nothing whatsoever to do with this old bug. The default configuration has to work for everything. When we know what hardware you have because it's something we sell, it's automatically set appropriately. You don't have to apply it on install, it can be set in the config as a system tunable.

#19 Updated by Denis Kozlov about 3 years ago

Once again, why can't this be addressed in pfSense?

I mean, scale the MBUF according to the number of cores and network cards. Job done.

Do you really expect people to forever run pfSense on 1 core processors and 1 network card - because that is what pfSense is "tuned" for? Consequently, forcing us to having to fix non-accessible pfSense on every new install.

People will keep coming back to this issue until it is fixed.

#20 Updated by Kill Bill about 3 years ago

Denis Kozlov wrote:

I mean, scale the MBUF according to the number of cores and network cards. Job done.

That's not the FreeBSD way of doing things. It's much more fun to fiddle with some cryptic shit tunebles. This is the most recent bitchfest I could find regarding the igb issue: http://marc.info/?t=130143011800001&r=1&w=2

Also available in: Atom PDF