Bug #15601: Routes with IPv6 Address as Next Hop for IPv4 Destination Causes Kernel Panic - pfSense - pfSense bugtracker

Custom queries

2.8.0 - Resolved/Closed
2.8.1 - Resolved/Closed
2.9.0 - All Open Bugs
2.9.0 - All Open Features
2.9.0 - All Open Issues
2.9.0 - All Open Regressions
2.9.0 - Feedback
2.9.0 - Needs Attention
2.9.0 - New/Confirmed
2.9.0 - Pull Requests
2.9.0 - Regressions affecting 2.9.0
2.9.0 - Resolved/Closed
25.07 Plus - All Closed Issues
25.07 Target - All Closed Issues
25.11 Plus - All Closed Issues
25.11 Target - All Closed Issues
25.11.1 Plus - All Closed Issues
25.11.1 Target - All Closed Issues
26.03 Plus - All Closed Issues
26.03 Target - All Closed Issues
26.03.1 Plus - All Closed Issues
26.03.1 Target - All Closed Issues
26.07 Plus - All Closed Issues
26.07 Plus - All Open Issues
26.07 Plus - Feedback Issues
26.07 Plus - Needs Attention/Work
26.07 Plus - New/Confirmed/In Progress Issues
26.07 Plus - Pull Request Review
26.07 Plus - Waiting on Merge
26.07 Target - All Closed Issues
26.07 Target - All Open Issues
26.10 Plus - All Closed Issues
26.10 Plus - All Open Issues
26.10 Plus - Feedback Issues
26.10 Plus - Needs Attention/Work
26.10 Plus - New/Confirmed/In Progress Issues
26.10 Plus - Pull Request Review
26.10 Plus - Waiting on Merge
26.10 Target - All Closed Issues
26.10 Target - All Open Issues
All Open Issues assigned to Me
All Open Pull Requests
Any Target - All Open Regressions
Any Target - Feedback Issues
CE-Next - All Closed Issues (Move to specific target)
CE-Next - All Open Issues
CE-Next - Feedback (Likely needs target changed)
New Issues by Category - Future Target
New Issues by Category - No Target
New Issues by Category - No Target+Future
No Target - All Open Issues (Base Only)
No Target - New Issues (Base Only)
No Target - New Issues (Base and Packages)
Release Notes - CE Target Version (DO NOT EDIT)
Release Notes - Plus Target Version (DO NOT EDIT)
Release Notes - Target Version (DO NOT EDIT)

Actions

Copy link

Bug #15601

closed

Routes with IPv6 Address as Next Hop for IPv4 Destination Causes Kernel Panic

Added by Kris Phillips about 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

Mateusz Guzik

Category:

Routing

Target version:

2.8.0

Start date:

Due date:

% Done:

100%

Estimated time:

Plus Target Version:

24.11

Release Notes:

Default

Affected Version:

All

Affected Architecture:

All

Description

If an entry is able to be made that adds a route for IPv4 traffic to be sent to an IPv6 destination, this can cause a page fault kernel panic and crash.

History
Notes
Property changes

Actions

Copy link

Updated by Jim Pingle about 2 years ago

Project changed from pfSense Plus to pfSense
Category changed from SNMP to Routing
Status changed from New to Feedback
Affected Plus Version deleted (~~24.03~~)

How exactly is someone making that sort of entry? It can't be made in the GUI via static routes, input validation rejects it. It can't be made at the CLI, the route command rejects it.

Actions

Copy link

Updated by Kristof Provost about 2 years ago

The relevant bits from the (private) crash dump is this:

db:0:kdb.enter.default>  run pfs
db:1:pfs> bt
Tracing pid 12 tid 100120 td 0xfffff80005b91000
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0106720800
panic() at panic+0x43/frame 0xfffffe0106720860
trap_fatal() at trap_fatal+0x40f/frame 0xfffffe01067208c0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0106720920
calltrap() at calltrap+0x8/frame 0xfffffe0106720920
--- trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp = 0xfffffe0106720a00 ---
turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00
__rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30
nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0
nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10
ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0
ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0
ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0
ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90
ip_input() at ip_input+0x705/frame 0xfffffe0106720df0
swi_net() at swi_net+0x138/frame 0xfffffe0106720e60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

addr2line points us to this code section in nd6_resolve_slow():

2455   │     /* If we have child lle, switch to the parent to send NS */
2456   │     if (lle->la_flags & LLE_CHILD) {
2457   │         struct llentry *lle_parent = lle->lle_parent;
2458   │         LLE_WUNLOCK(lle);
2459   │         lle = lle_parent;
2460   │         LLE_WLOCK(lle);
2461   │     }

The crash happens on the lock of the parent lie on line 2460. The most probably reason for this is a race between this code and unlinking of the child/parent lle. I believe we should be acquiring the parent lock before we release the child lock.

Actions

Copy link

Updated by Kristof Provost about 2 years ago

Jim Pingle wrote in #note-2:

How exactly is someone making that sort of entry? It can't be made in the GUI via static routes, input validation rejects it. It can't be made at the CLI, the route command rejects it.

I have this in my test case to at least run the relevant code path:

route add -6 -net -inet 0.0.0.0/0 -inet6 2001:db8::1

The customer's routing table also has entries like this:

10.0.0.0/24        2001:db8:42::3 UG1     21   1500   lagg0.10

Actions

Copy link

Updated by Kristof Provost about 2 years ago

I've proposed this upstream: https://reviews.freebsd.org/D45913 and copied the original author of the relevant code.

Actions

Copy link

Updated by Jim Pingle about 2 years ago

Status changed from Feedback to In Progress
Assignee set to Kristof Provost

Actions

Copy link

Updated by Jim Pingle about 2 years ago

Target version set to 2.8.0
Plus Target Version set to 24.08
Affected Version set to All

Actions

Copy link

Updated by Kris Phillips about 2 years ago

Jim Pingle wrote in #note-2:

How exactly is someone making that sort of entry? It can't be made in the GUI via static routes, input validation rejects it. It can't be made at the CLI, the route command rejects it.

This route was added by FRR BGP learning a route.

Actions

Copy link

Updated by Mateusz Guzik about 2 years ago

Note that these IPs like to be one instruction off. The __rw_wunlock_hard is just prior and it operates on the child -- the parent was not looked at yet. Therefore it is the child which failed to unlock.

Normally a panic like this means the value of the lock itself is corrupted -- the fast path fails and the fallback expects there are blocked threads waiting to be woken up. The crash stems from failing to find any.

For the buggy state to occur something had to damage the lock or there is a bug in locking primitives (I'm ruling out the latter though).

Would the customer be willing to run a kernel with certain debug facilities added? Performance should be about the same, but it should also shed a light on what's going on here.

I can prep everything tomorrow. It is very easy to plop a new kernel in, but I don't know if there is a blessed way here or there is some hand-holding for the customer needed. I'm counting on the support team here.

Actions

Copy link

#10

Updated by Mateusz Guzik about 2 years ago

Assignee changed from Kristof Provost to Mateusz Guzik

Actions

Copy link

#12

Updated by Jim Pingle almost 2 years ago

Plus Target Version changed from 24.08 to 24.11

Actions

Copy link

#13

Updated by Mateusz Guzik over 1 year ago

The customer was shipped with 2 kernels. First added some debug and another added a workaround for the suspected issue.

The customer claims the crashes stopped and it was confirmed they are running the kernel variant which was expected to fix the issue.

However, they had a period of time where they were running the debug kernel which was expected to crash and did not (it did crash eventually).

Meaning we don't know for sure whether the problem is mitigated. The good news is that the mitigation is harmless, thus it landed for the time being: https://gitlab.netgate.com/pfSense/FreeBSD-src/-/commit/5b6ba89cd18f370f42c72e09c750e6ae5bc9a0a6 . It is going to point out in dmesg that it had to be used.

Actions

Copy link

#14

Updated by Jim Pingle over 1 year ago

Status changed from In Progress to Feedback
% Done changed from 0 to 100

Actions

Copy link

#15

Updated by Jim Pingle over 1 year ago

Status changed from Feedback to Resolved

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

pfSense

Custom queries

Bug #15601

Routes with IPv6 Address as Next Hop for IPv4 Destination Causes Kernel Panic

Updated by Jim Pingle about 2 years ago

Updated by Kristof Provost about 2 years ago

Updated by Kristof Provost about 2 years ago

Updated by Kristof Provost about 2 years ago

Updated by Jim Pingle about 2 years ago

Updated by Jim Pingle about 2 years ago

Updated by Kris Phillips about 2 years ago

Updated by Mateusz Guzik about 2 years ago

Updated by Mateusz Guzik about 2 years ago

Updated by Jim Pingle almost 2 years ago

Updated by Mateusz Guzik over 1 year ago

Updated by Jim Pingle over 1 year ago

Updated by Jim Pingle over 1 year ago