Project

General

Profile

Actions

Bug #15601

open

Routes with IPv6 Address as Next Hop for IPv4 Destination Causes Kernel Panic

Added by Kris Phillips 5 months ago. Updated 29 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
Routing
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
24.11
Release Notes:
Default
Affected Version:
All
Affected Architecture:
All

Description

If an entry is able to be made that adds a route for IPv4 traffic to be sent to an IPv6 destination, this can cause a page fault kernel panic and crash.

Actions #2

Updated by Jim Pingle 5 months ago

  • Project changed from pfSense Plus to pfSense
  • Category changed from SNMP to Routing
  • Status changed from New to Feedback
  • Affected Plus Version deleted (24.03)

How exactly is someone making that sort of entry? It can't be made in the GUI via static routes, input validation rejects it. It can't be made at the CLI, the route command rejects it.

Actions #3

Updated by Kristof Provost 5 months ago

The relevant bits from the (private) crash dump is this:

db:0:kdb.enter.default>  run pfs
db:1:pfs> bt
Tracing pid 12 tid 100120 td 0xfffff80005b91000
kdb_enter() at kdb_enter+0x33/frame 0xfffffe0106720800
panic() at panic+0x43/frame 0xfffffe0106720860
trap_fatal() at trap_fatal+0x40f/frame 0xfffffe01067208c0
trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0106720920
calltrap() at calltrap+0x8/frame 0xfffffe0106720920
--- trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp = 0xfffffe0106720a00 ---
turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00
__rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30
nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0
nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10
ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0
ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0
ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0
ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90
ip_input() at ip_input+0x705/frame 0xfffffe0106720df0
swi_net() at swi_net+0x138/frame 0xfffffe0106720e60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---

addr2line points us to this code section in nd6_resolve_slow():

2455   │     /* If we have child lle, switch to the parent to send NS */
2456   │     if (lle->la_flags & LLE_CHILD) {
2457   │         struct llentry *lle_parent = lle->lle_parent;
2458   │         LLE_WUNLOCK(lle);
2459   │         lle = lle_parent;
2460   │         LLE_WLOCK(lle);
2461   │     }

The crash happens on the lock of the parent lie on line 2460. The most probably reason for this is a race between this code and unlinking of the child/parent lle. I believe we should be acquiring the parent lock before we release the child lock.

Actions #4

Updated by Kristof Provost 5 months ago

Jim Pingle wrote in #note-2:

How exactly is someone making that sort of entry? It can't be made in the GUI via static routes, input validation rejects it. It can't be made at the CLI, the route command rejects it.

I have this in my test case to at least run the relevant code path:

route add -6 -net -inet 0.0.0.0/0 -inet6 2001:db8::1

The customer's routing table also has entries like this:

10.0.0.0/24        2001:db8:42::3 UG1     21   1500   lagg0.10

Actions #5

Updated by Kristof Provost 5 months ago

I've proposed this upstream: https://reviews.freebsd.org/D45913 and copied the original author of the relevant code.

Actions #6

Updated by Jim Pingle 5 months ago

  • Status changed from Feedback to In Progress
  • Assignee set to Kristof Provost
Actions #7

Updated by Jim Pingle 4 months ago

  • Target version set to 2.8.0
  • Plus Target Version set to 24.08
  • Affected Version set to All
Actions #8

Updated by Kris Phillips 4 months ago

Jim Pingle wrote in #note-2:

How exactly is someone making that sort of entry? It can't be made in the GUI via static routes, input validation rejects it. It can't be made at the CLI, the route command rejects it.

This route was added by FRR BGP learning a route.

Actions #9

Updated by Mateusz Guzik 4 months ago

Note that these IPs like to be one instruction off. The __rw_wunlock_hard is just prior and it operates on the child -- the parent was not looked at yet. Therefore it is the child which failed to unlock.

Normally a panic like this means the value of the lock itself is corrupted -- the fast path fails and the fallback expects there are blocked threads waiting to be woken up. The crash stems from failing to find any.

For the buggy state to occur something had to damage the lock or there is a bug in locking primitives (I'm ruling out the latter though).

Would the customer be willing to run a kernel with certain debug facilities added? Performance should be about the same, but it should also shed a light on what's going on here.

I can prep everything tomorrow. It is very easy to plop a new kernel in, but I don't know if there is a blessed way here or there is some hand-holding for the customer needed. I'm counting on the support team here.

Actions #10

Updated by Mateusz Guzik 4 months ago

  • Assignee changed from Kristof Provost to Mateusz Guzik
Actions #12

Updated by Jim Pingle about 1 month ago

  • Plus Target Version changed from 24.08 to 24.11
Actions #13

Updated by Mateusz Guzik 29 days ago

The customer was shipped with 2 kernels. First added some debug and another added a workaround for the suspected issue.

The customer claims the crashes stopped and it was confirmed they are running the kernel variant which was expected to fix the issue.

However, they had a period of time where they were running the debug kernel which was expected to crash and did not (it did crash eventually).

Meaning we don't know for sure whether the problem is mitigated. The good news is that the mitigation is harmless, thus it landed for the time being: https://gitlab.netgate.com/pfSense/FreeBSD-src/-/commit/5b6ba89cd18f370f42c72e09c750e6ae5bc9a0a6 . It is going to point out in dmesg that it had to be used.

Actions #14

Updated by Jim Pingle 29 days ago

  • Status changed from In Progress to Feedback
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF