Project

General

Profile

Actions

Bug #11836

open

FRR ACCEPTFILTER unstable

Added by Gavin Owen 6 months ago. Updated 6 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
FRR
Target version:
-
Start date:
04/22/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Affected Version:
2.5.x
Affected Plus Version:
Affected Architecture:
All

Description

Adding entries to the ACCEPTFILTER prefix-list creates erratic behavior within the FRR running configuration.

Have a look at my notes in "show running-configuration output.txt" and you'll see the configuration is constantly changing.
Leave it for hours and it'll still keep cycling.
This makes the FRR process unstable and leads to routes sporadically going inactive (which were not inactive beforehand). Example:

firewall1.home.arpa#  show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

K>* 0.0.0.0/0 [0/0] via 100.0.1.1, em0, 04:20:57
K * 10.1.194.0/24 [0/0] via 10.1.194.2 inactive, 04:20:57
C>* 10.1.194.0/24 [0/1] is directly connected, ovpns1, 04:20:57
O   10.2.194.0/24 [110/20] via 10.255.1.2, ovpns2 inactive onlink, weight 1, 04:20:36       ### should not be inactive!!! VPN users behind firewall2
C>* 10.255.1.2/32 [0/1] is directly connected, ovpns2, 04:20:57
C>* 10.255.2.2/32 [0/1] is directly connected, ovpns3, 04:20:57
C>* 100.0.1.0/30 [0/1] is directly connected, em0, 04:20:57
C>* 100.0.2.0/30 [0/1] is directly connected, em2, 04:20:57
O   192.168.1.0/24 [110/4] is directly connected, em1 inactive, weight 1, 04:20:57          ## this is fine - LAN of firewall1
C>* 192.168.1.0/24 [0/1] is directly connected, em1, 04:20:57
O   192.168.2.0/24 [110/8004] via 10.255.1.2, ovpns2 inactive onlink, weight 1, 04:20:37    ## should not be inactive!! LAN of firewall2
C>* 192.168.57.0/24 [0/1] is directly connected, em3, 04:20:57
firewall1.home.arpa#

The topology diagram for this lab is logged in [[https://redmine.pfsense.org/issues/11835]].

I'm using the exact same topology as that for this issue, but enabling the options that trigger this ACCEPTFILTER problem.

In my production environment I have the same basic topology (simplified for fault analysis), but I also have downstream OSPF routers behind "firewall1" LAN.
I thus generate a default route on firewall1 and flood that out the LAN.
Over on my "firewall2" - that has its own default route being an Internet fireall, and it doesn't need firewall1's default. So, I use the option #2 above of filtering 0.0.0.0/0 in the global settings - see screenshot "FRR Global ACCEPTFILTER.png".

This all seemed to work on older versions of pfSense such as 2.4.5p1, so looks to be a regression (feel free to confirm but don't have a non-production 2.4.5p1 lab setup anymore).

The combination of enabling both of these filtering methods at the same time (interface "Accept Filter" and global configuration "Routes: Do Not Accept") is pretty catastrophic - for me the vtysh locks up completely. Even trying the new option "Force Service Restart" will not allow FRR to run. The only fix then is to reboot the whole pfSense node, and then undo those settings - followed by another reboot for good measure.

This may account for some of the horror stories told to me on the pfsense subreddit, from OSPF users upgrading into the 2.5.x releases (issue #11835 was occuring in earlier releases).


Files

Actions #1

Updated by Jim Pingle 6 months ago

  • Project changed from pfSense to pfSense Packages
  • Category changed from Routing to FRR
  • Release Notes deleted (Default)
Actions

Also available in: Atom PDF