igb driver mbuf allocation problems on multicore machines aka Could not setup receive structures
I'm creating this ticket in relation to the following forum topic since I don't think an bug was submitted by the OP:
The issue stems from the igb driver autoconfiguring the number of queues based on the number of cpu cores. On systems with many cores, the queues end up outstripping the mbuf resource pool. On processors with Hyper Threading, the allocation doubles. See http://unix.derkeiler.com/Mailing-Lists/FreeBSD/stable/2010-10/msg00267.html
A temporary workaround was to neuter the system by using a uniprocessor kernel, disabling Hyper Threading, and maxing out the mbuf size with settings defined in /boot/loader.conf
In light of bug #560 (http://redmine.pfsense.org/issues/560) values set in /boot/loader.conf get wiped by updates, leaving a multicore system with igb interfaces completely inaccessible over the network.
#4 Updated by R M about 7 years ago
Thanks for the response Jim.
Since there's no man pages available in pfSense, my reference regarding the matter was the following from an existing FreeBSD 8.1-RELEASE machine:
---- SNIP ----
/boot/loader.conf user defined settings.
/boot/loader.conf.local machine-specific settings for sites with a common loader.conf.
---- /SNIP ----
Consider that my pfSense install is a standalone machine, and thus not using a common loader.conf used on any other machines. By the reference provided from man, I have placed my user defined settings in what was understood to be the correct place.On a stock pfSense install:
- No comments in the header of loader.conf that suggests the use of loader.conf.local over loader.conf
- No empty loader.conf.local with comments that suggests that it should be used over loader.conf
Comparing loader.4th, support.4th, boot* etc to a stock FreeBSD 8.1-RELEASE, diffs provide no noticeable differences between the pfSense and stock versions.
What is different however is that pfSense writes it's own values to loader.conf without checking or sucking existing values back in that may exist in loader.conf, which is an entirely different problem from people using loader.conf as it has been intended.
On the face of it I can understand why it is easy to dismiss the reported issue as being unrelated to #560, but consider that for those like me who looked up kern.ipc.nmbclusters and machdep.hyperthreading_allowed or any settings required at boot time are going to find their reference material from people who have most likely dealt with these settings safely on stock FreeBSD systems will repeat my mistake by using loader.conf on pfSense and end up being gutted by #560.
There's only 2 references for the use of loader.conf.local in combination with kern.ipc.nmbclusters on the forum and the other reference noted as the location as a modification required for squid package tuning and where only found by explicitly searching for loader.conf.local and kern.ipc.nmbclusters, and did not turn up in the top results for just loader.conf and kern.ipc.nmbclusters.
References from http://doc.pfsense.org/index.php/Booting_Options indicate the use of loader.conf, but it's not until http://doc.pfsense.org/index.php/Boot_Troubleshooting that loader.conf.local is indicated to be used, but without explanation as to why one may want do that.
For all intensive purposes, it may be safer for pfSense to leave loader.conf alone, and in turn write it's machine-specific settings to loader.conf.local instead of breaking existing conventions, or even better, have a pfSense specific conf that can get sucked in trough updated loader and support scripts that don't interfere with loader.conf and loader.conf.local intended use.
Another consideration may be to have a new section under advanced that let's people provide additional boot options required and have them written out to loader.conf along with the pfSense specific values so that it's not necessary to introduce those changes with manual edits.
#5 Updated by Jim Pingle about 7 years ago
All that has been discussed at length (and not on an unrelated ticket), it was much easier to leave loader.conf.local for users to use. It's been mentioned in several areas, but perhaps the docs need updating. If you want to discuss it further, use the other ticket. It's still not related to this one.
#8 Updated by R M about 7 years ago
Been testing hw.igb.num_queues="4" for the last week and so far it seems to be working with no problems so far with HT disabled on uniprocessor kernel.
Also tested briefly with HT turned on and saw no problems either.
hw.igb.num_queues="2" may be more than reasonable as well, but I have not had an opportunity to test that yet.
#10 Updated by Kai Poggensee almost 4 years ago
Please re-open this issue, it is by no means resolved.
We have a system here that exposes the very same problem.
Barebone Superserver 5018A-FTN4 19"
feat. a Intel® Atom processor C2758, SoC (no HyperTreading! )
and 32GB of RAM
and we see the infamous
"igb[N]: Could not setup receive structures"
directly when booting.
On a sidenote, it could be there is a typo in the above comment.
but should probably read
As this problem is still existing in PFsense, I would suggest implementing
a workaround for next release. Why not increase the nmbclusters default?
Wouldn't hurt for a system that is firewall/router/gateway etc. anyway, no?
we have implemented only the following two parameters in /boot/loader.conf.local:
#12 Updated by Denis Kozlov about 3 years ago
This is not resolved!
A fresh install of latest pfSense 2.2.1 (FreeBSD 10.1) on hardware with 8 CPU cores, 8 GB RAM and 4 Gigabit ports reaches MBUF limit immediately upon first boot, making the device inaccessible right from the start. MBUF usage is 100% (26584/26584).
Default interfaces not found -- Running interface assignment option. [zone: mbuf_cluster] kern.ipc.nmbclusters limit reached igb3: Could not setup receive structures igb3: Could not setup receive structures Valid interfaces are...
I'm aware that manual tweaking of kernel parameters "kern.ipc.nmbclusters" and "hw.igb.num_queues" works around the issue - but this is in no way a resolution. There was a hope that the praised MBUFs auto-scaling in FreeBSD 10.x would solve this issue, but it doesn't seem to do anything at all to improve it, at least not for the type of hardware in question.
This brings a question... Why can't this be addressed in pfSense? Why not automatically (or at least suggestively offer) to scale MBUF limits according to the hardware? It should be simple enough to get number of available CPU cores and number of network interfaces and use these variables to scale MBUF limits accordingly?
The goal here is to make pfSense usable right from the start on non-primitive hardware.
System: SuperServer 5018A-FTN4
Processor: Intel Atom C2758, 8 Cores
Network interfaces: 4 x Intel Gigabit
Memory: 8 GB
#13 Updated by Chris Buechler about 3 years ago
you need to follow the guidance here:
as appropriate for your hardware. Or buy from store.pfsense.org, where we set all that appropriately for the combination of hardware you're purchasing. This issue is as resolved as it can be.
#14 Updated by Denis Kozlov about 3 years ago
The same problem occurs even when purchased from pfStore Store. A clean (re)install of pfSense is unreachable.
I don't understand why this can't be addressed in pfSense?
It will save hours (otherwise wasted debugging and researching) for users who install pfSense on anything with few CPU cores and few NICs. This is not a simple tunable parameter that improves performance, this is a bug that prevents pfSense from operating on clean install.
Going forward, as networking demands keep increasing, it will only be getting worse unless addressed.
#17 Updated by Emanuel Somosan almost 3 years ago
Still on 2.2.3 this bug is for sure not resolved. Yes there is a manual workaround that needs to be applied on every fresh install, but this is ridiculous. Is it ment that way, that professional hardware is not supported by the "comunity version", as fixes for these kind of bugs are reserved to the "non comunity version" only?
#18 Updated by Chris Buechler almost 3 years ago
The original problem here from years back has nothing to do with anything current, that was a 4 year old driver problem that was resolved 4 years ago. With some combinations of hardware, where you don't buy it from us (so we don't know what it is), you may need to tune things accordingly if you hit the message in the subject. That has nothing whatsoever to do with this old bug. The default configuration has to work for everything. When we know what hardware you have because it's something we sell, it's automatically set appropriately. You don't have to apply it on install, it can be set in the config as a system tunable.
#19 Updated by Denis Kozlov almost 3 years ago
Once again, why can't this be addressed in pfSense?
I mean, scale the MBUF according to the number of cores and network cards. Job done.
Do you really expect people to forever run pfSense on 1 core processors and 1 network card - because that is what pfSense is "tuned" for? Consequently, forcing us to having to fix non-accessible pfSense on every new install.
People will keep coming back to this issue until it is fixed.
#20 Updated by Kill Bill almost 3 years ago
Denis Kozlov wrote:
I mean, scale the MBUF according to the number of cores and network cards. Job done.
That's not the FreeBSD way of doing things. It's much more fun to fiddle with some cryptic shit tunebles. This is the most recent bitchfest I could find regarding the igb issue: http://marc.info/?t=130143011800001&r=1&w=2