Project

General

Profile

Bug #7606

Using limiters and VLANs on Supermicro Xeon D boards crashes with kernel panic

Added by Collateral Fortune over 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Limiters
Target version:
-
Start date:
05/26/2017
Due date:
% Done:

0%

Estimated time:
Affected Version:
All
Affected Architecture:

Description

Confirmed on three different Supermicro Xeon D boards, 1508/1518/1521. With the similarities between these boards that I've seen, I wouldn't be surprised if it affected the whole line.
Confirmed on 2.3.4 RELEASE and 2.4 beta.

Was NOT able to reproduce on a VM

Uses the igb and ix drivers.

Used "pfSense-CE-2.3.4-RELEASE-amd64.iso".
1. Plugged "WAN" into port 1, plugged "LAN/Laptop" into port 2. Both are the gigabit ports, not the 10Gb ports. But I've seen it happen on both.
2. Install
3. Run though the basic setup
4. Set up a new random VLAN, attached to ibg1, used 99
5. Set up a new interface using VLAN_99, OPT.
6. Set up static IP and DHCP on OPT, 10.23.1.1, 10.23.1.10-20
7. Use the multi-LAN/WAN wizard to create a new limiter. Used LAN and OPT1. Used CBQ. All options on step 2-8 are default.

Server now runs fine. But it instantly implodes the second any traffic transverses the VLAN interface. The instant the laptop grabs a DHCP address on the VLAN network and tries to connect to the Internet, kernel panic.

At this point the server is locked, not responding to keyboard input, including CNTL-ALT-DEL and needs a hard power cycle.

Rebooting does a few things, just based on the timing of the traffic over the VLAN interface:

1. Kernel panics and bootloops when it tries to activate the firewall. On a busy VLAN, this is generally what happens.
2. Hard locks, without even the kernel panic, once traffic moves over the VLAN interface. With just one client on the VLAN, this is what happened through a couple of reboots.

IMG_0480.jpg (700 KB) IMG_0480.jpg Kernel panic Collateral Fortune, 05/26/2017 12:34 PM
1.jpg (112 KB) 1.jpg putzomatic none, 06/01/2017 05:45 PM
2.jpg (109 KB) 2.jpg putzomatic none, 06/01/2017 05:45 PM
4.jpg (110 KB) 4.jpg putzomatic none, 06/01/2017 05:45 PM
3.jpg (114 KB) 3.jpg putzomatic none, 06/01/2017 05:45 PM
5.jpg (114 KB) 5.jpg putzomatic none, 06/01/2017 05:45 PM
6.jpg (103 KB) 6.jpg putzomatic none, 06/01/2017 05:45 PM
7.jpg (94.7 KB) 7.jpg putzomatic none, 06/01/2017 05:45 PM
8.jpg (78.6 KB) 8.jpg putzomatic none, 06/01/2017 05:45 PM
begin-crash.JPG (67.7 KB) begin-crash.JPG putzomatic none, 06/01/2017 06:19 PM

History

#1 Updated by Jim Pingle over 3 years ago

  • Status changed from New to Feedback

Please test against a 2.4 snapshot. Attach crash dump data here as well, as the report has very little use without it, and what little shows in the picture doesn't help much.

Also, if you used the shaper wizard that is ALTQ and not Limiters. Two completely different worlds, and the distinction is critical.

#2 Updated by putzomatic none over 3 years ago

Sorry not sure if this is right place for this but this issue seems almost identical to what I am experiencing on 2.4 (2.4.0.b.20170523.0819) with my Supermicro Atom C2578 board.
My lan is set as VLAN 99_LAGG0
traffic shaper enabled with wizard (HFSC)
LAN firewall rule enabled that puts traffic to specific queue
once traffic starts flowing, the whole system crashes, reboots, and I have about 1-2 minutes to login when its back up to disable the LAN rule and shaper queues otherwise it repeats.

Oddly I upgraded from 2.3.4 and on 2.3.4 there were no problems.

Is there a way I can submit crash dump if my build doesnt have SWAP?

#3 Updated by Collateral Fortune over 3 years ago

Installed:

pfSense-CE-2.4.0-BETA-amd64-20170526-0955.iso

Installed on 500GB hard drive. Swap exists.

Problem occurs immediately again.

No crash logs are generated. Entire content of /var/crash is "minfree -> 2048".

The system completely hard freezes as soon as traffic hits the VLAN interface.

2.3.4 at least generated kernel panics, but there is nothing in /var/crash either.

Any other advice for troubleshooting, let me know

#4 Updated by putzomatic none over 3 years ago

I dont think I can provide a crash dump without swap on my build but here are some screencaps of the console once the system shits the bed. happens about a couple minutes after enabling the traffic shaper queues on the interfaces. Basically all that just fills the screen for about 15-20 seconds or so and then the system reboots.

#5 Updated by putzomatic none over 3 years ago

I was able to capture the very beginning of the crash, see pic

#6 Updated by putzomatic none over 3 years ago

Im curious if the information I posted is useful enough to determine what might be happening since I havent seen any updates to this bug? I know I didnt open this bug report but the issue is so similar to what i am experiencing there has to be something going on in 2.4 that wasnt there on 2.34. Would be nice to be able to use t he shaper again. I have since updated my pfsense 2.4 to build 2.4.0.b.20170615.0722, I have removed the shaper and recreated it and have exactly the same results once traffic starts hitting my shaper queue.

#7 Updated by putzomatic none almost 3 years ago

For now it appears my issue has been resolved on 2.4.0.r.20170926.1006.

Side note though, floating rule still doesn't seem to put traffic in queue, I had to create a LAN rule to pass the traffic and set the queue for it to work. But good news no crashes yet.

Perhaps OPs issue could be resolved as well with the newer release?

#8 Updated by Jim Pingle almost 3 years ago

  • Status changed from Feedback to Resolved

This appears to be working fine on current versions and no additional feedback from the user. Closing.

Also available in: Atom PDF