Bug #11835
openFRR OSPF redistributed connected routes disappearing
0%
Description
pfSense/FRR is flushing and repropagating certain OSPF routes unnecessarily, causing outages.
Scenario is two firewalls with two WANs each. WAN1-WAN1 OpenVPN tunnel, and likewise for WAN2-WAN2 OpenVPN tunnel. WAN1 tunnel is preferred based on OSPF costs. See my network diagram attached.
The OpenVPN work-from-home RoadWarrior users use the "topology subnet" of a /24 which is a connected network (see attachment for "netstat -rn"). This /24 connected route is then redistributed into OSPF (using prefix list and route map), so that the other firewall can learn the routes.
Now imagine the backup tunnel drops out (because the 4G is having some interference issues... WAN1 is fibre). The OpenVPN RoadWarrior /24 route still propagates fine over the primary tunnel - so far, no impact.
The issue is when the backup WAN tunnel (TUNNEL2) re-establishes. pfSense flushes all known redistributed connected routes from OSPF, and then these drop out of OSPF for approx. 8 to 10 seconds (in my testing), before reappearing again on the far-end firewall.
This shouldn't happen as it doesn't matter what the backup tunnel does - goes up, goes down, goes outside for a coffee and a smoke (joking) - the point is that this connected route SHOULD STAY PINNED UP - no matter what the backup does, as it is learned over primary tunnel, and nothing has happened to the primary tunnel - nothing (!!).
I have lab-tested this on two other implementations of FRR (VyOS being one) and they don't exhibit the problem - only pfSense does.
The impact is huge for my environment. Users work remotely and VPN into either firewall1 or firewall2, and they often need resources from the LANs behind the other firewall - either a file copy; RDP; an interactive web page etc; and as soon as that backup link re-establishes, they get kicked out of all these things, then lodge support calls and emails. Then there's the systems impacts - gaps in our SMTP monitoring; traffic monitoring; and causes havoc with remote site backups failing.
Now I can't get rid of the poor quality WAN2 links for something better (as much as I'd love to, at least not in the short term), but I can raise this fault for investigation. Further testing I did back for 2.4.5p1 several months ago showed that the backup link didn't have to connect back to the firewall losing the route(s) - I saw that it could be a tunnel to a third firewall going down. If there are multiple connected routes, then all get flushed. I also tested back then with IPSEC and that setup behaved the same. Was hoping at the time it might be a FRR upstream issue but even with FRR 7.5 it's still an issue. Hope this can get some traction because at the moment I can't run dynamic routing over tunnels because of the issue, and any issues with WAN1 means I have to manually flick the routes over to WAN2, which is a lot slower than a dynamic routing protocol.
Let me know if you need anything further. Thanks.
Files