Bug #6305
closedQuagga problems updating routes / mistakenly showing "kernel"-routes while they are not
0%
Updated by Chris Buechler over 8 years ago
- Project changed from pfSense to pfSense Packages
- Category set to Quagga OSPF
- Target version deleted (
2.3.1)
Updated by jeroen van breedam over 8 years ago
for me, downgrading to older version seems to solve all issues. no confirmation if this is the case for the OP of the forum post.
working version: http://pkg.freebsd.org/freebsd:10:x86:64/release_3/All/quagga-0.99.24.1_2.txz
Updated by jeroen van breedam over 8 years ago
OP hasn't found the time to respond to the post.
a different forum member has confirmed reverting to version above solves it.
Could anyone revert the update to before 1.x ?
Updated by jeroen van breedam over 8 years ago
A different forum member has came across this issue & has confirmed that reverting to 0.99.24.1 fixes the problem.
Updated by jeroen van breedam over 8 years ago
Any of the coredevs been able to replicate this?
Updated by Reqlez Guy over 8 years ago
jeroen van breedam wrote:
Any of the coredevs been able to replicate this?
I have ( but I'm not a Dev) And everybody else who actually uses OSPF and tried to failover links has.
This is a core functionality bug and it needs more attention.
Updated by Jim Pingle over 8 years ago
- Status changed from New to Feedback
I see the routes sometimes (but not always) marked as Kernel routes in the Zebra routing table, but I have not seen this be an actual problem for routing. So it's possible the K routes are not actually the source of your problems.
I have a test setup here with a central router, and two clients that each have two WANs, so four OpenVPN instances total on the "server" (two each on clients), and each of them have OSPF. If I kill a preferred WAN on a client, the K route goes inactive, quagga selects the O route and traffic flows again. When it recovers, it continues to work as well, but with the expected traffic hiccup as OSPF switches things around.
It's possible there is some misconfiguration happening that is handled differently in quagga 1.0.x vs 0.99.x, but as far as I can see, it works when configured properly. Continue the discussion on the forum thread but please post the contents of the zebra.conf and quagga.conf files (masking/removing passwords), and preferably post the entire output of the status tab in an attachment on the forum thread.
Updated by jeroen van breedam over 8 years ago
i've updated reply #7 to include the config of client side & server side
the status of before/after was there already.
I'm currently not in a position to dump the entire contents of status before/after because employees&bosses don't like it much when things stop working =)
Atleast 4 people have reported similar issue's. It's odd that it doesn't occur in the test-setup - we must be missing something here ...
If you need more data i'll try to provide it.
Updated by jeroen van breedam over 8 years ago
seems like someone has found a way to reproduce consistenly. (this is currently not verified by others)
https://forum.pfsense.org/index.php?topic=111108.msg630396#msg630396
Updated by Reqlez Guy over 8 years ago
Okay ... I have to set-up already that if i upgrade the package back to the new one, the issue will happen. Jim ... can I just privately send you the config files of the routers somehow and you can just take a look ? or what info do you need that i can safely post on this bug tracker ?
Updated by Jim Pingle over 8 years ago
Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.
We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.
Updated by Reqlez Guy about 8 years ago
Jim Pingle wrote:
Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.
We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.
I emailed the quagga users list and got a response https://lists.quagga.net/pipermail/quagga-users/2016-October/014474.html
Above is the thread regarding this. Also ... I know Jim Pingle have provided a "no routing packages restart" patch for 2.3.1 ... but ... every time we update pfsense this is not going to work ... is it possible for this "no routing packages restart" to be made into an option under advenced settings in pfsense ? I have an issue with unstable links that bring down the network even if those links are lower priority because it seems that every time zebra gets rebooted the routes are wiped out and there is a period of a few seconds where there is no traffic while zebra restarts and learns the routes again, very annoying.
Updated by Nate Baker about 8 years ago
Jim Pingle wrote:
Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.
We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.
We are having this issue as well. It looks like to reproduce it, the quagga services (probably just zebra) need to be restarted. When that happens the kernel routes show up in the zebra routes, and from that point on things don't work properly. So it seems like there are two problems:
1) Every time a change is made to OSPF the services are restarted with the new config. This can be disruptive, and the Quagga team says it shouldn't be necessary. Also it triggers the problem with Quagga.
2) When Quagga is restarted, the kernel routes (which it put there before it was restarted) are pulled into Zebra, and will always take precedence until the firewall is restarted.
If we restart the firewall and never touch the Quagga settings things work fine. So to fix number 1, is it possible to write the configuration files and change the Quagga configuration by connecting to the Quagga VTYs, instead of restarting it? It seems like number 2 needs to be fixed by Quagga.
Updated by Reqlez Guy about 8 years ago
Nate Baker wrote:
Jim Pingle wrote:
Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.
We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.
We are having this issue as well. It looks like to reproduce it, the quagga services (probably just zebra) need to be restarted. When that happens the kernel routes show up in the zebra routes, and from that point on things don't work properly. So it seems like there are two problems:
1) Every time a change is made to OSPF the services are restarted with the new config. This can be disruptive, and the Quagga team says it shouldn't be necessary. Also it triggers the problem with Quagga.
2) When Quagga is restarted, the kernel routes (which it put there before it was restarted) are pulled into Zebra, and will always take precedence until the firewall is restarted.If we restart the firewall and never touch the Quagga settings things work fine. So to fix number 1, is it possible to write the configuration files and change the Quagga configuration by connecting to the Quagga VTYs, instead of restarting it? It seems like number 2 needs to be fixed by Quagga.
I'm working with Martin from Quagga and collecting debug logs this weekend. He thinks that "Quagga is being restarted in some hard way that won't allow it to clean up routes" but he also says he doesnt understand why Quagga needs to be restarted in pfsense in the first place when links change.
Updated by Reqlez Guy about 8 years ago
So far the only thing I got from Martin was that -9 is not a nice way to stop quagga and could cause the issues... Also I saw 1.1 release of quagga had this commit ... not sure if this is related:
commit 7e73eb740f3c52a5b7c0ae9c2cd33b486d885552
Author: Timo Teräs <timo.teras@iki.fi>
Date: Sat Apr 9 17:22:32 2016 +0300
zebra: handle multihop nexthop changes properly
The rib entries are normally added and deleted when they are
changed. However, they are modified in placae when the nexthop
reachability changes. This fixes to:
- properly detect nexthop changes from nexthop_active_update()
calls from rib_process()
- rib_update_kernel() to not reset FIB flags when a RIB entry
is being modifed (old and new RIB are same)
- improves the "show ip route <prefix>" output to display
both ACTIVE and FIB flags for each nexthop
Fixes: 325823a5 "zebra: support FIB override routes"
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Reported-By: Igor Ryzhov <iryzhov@nfware.com>
Tested-by: NetDEF CI System <cisystem@netdef.org>
Updated by winmasta winmasta almost 8 years ago
Affected me too. I tried settings with OpenVPN server + OpenVPN client.
Both:
Pfsense 2.3.2-RELEASE-p1
Quagga_OSPF 0.6.16 (quagga-1.0.20160315)
Server:
Quagga ospfd.conf
# This file was created by the pfSense package manager. Do not edit! password *** log syslog interface ovpns1 ip ospf cost 10 interface ovpns2 ip ospf cost 20 router ospf ospf router-id 192.168.3.7 log-adjacency-changes detail redistribute connected network 10.0.8.0/24 area 0.0.0.1
Quagga zebra.conf
# This file was created by the pfSense package manager. Do not edit! password *** log syslog
sudo cat /var/etc/openvpn/server1.conf
dev ovpns1 verb 1 dev-type tun tun-ipv6 dev-node /dev/tun1 writepid /var/run/openvpn_server1.pid #user nobody #group nobody script-security 3 daemon keepalive 10 60 ping-timer-rem persist-tun persist-key proto udp cipher BF-CBC auth SHA1 up /usr/local/sbin/ovpn-linkup down /usr/local/sbin/ovpn-linkdown local 1.2.3.4 tls-server server 10.0.8.0 255.255.255.0 client-config-dir /var/etc/openvpn-csc/server1 tls-verify "/usr/local/sbin/ovpn_auth_verify tls 'server' 1" lport 1194 management /var/etc/openvpn/server1.sock unix push "route 192.168.3.0 255.255.255.0" client-to-client ca /var/etc/openvpn/server1.ca cert /var/etc/openvpn/server1.cert key /var/etc/openvpn/server1.key dh /etc/dh-parameters.1024 tls-auth /var/etc/openvpn/server1.tls-auth 0 comp-lzo adaptive passtos persist-remote-ip float topology subnet route 192.168.1.0 255.255.255.0 10.0.8.1 route 192.168.0.0 255.255.255.0 10.0.8.1 route 192.168.5.0 255.255.255.0 10.0.8.1 route 192.168.8.0 255.255.255.0 10.0.8.1 route 192.168.9.0 255.255.255.0 10.0.8.1 route 192.168.10.0 255.255.255.0 10.0.8.1
sudo cat /var/etc/openvpn/server2.conf
dev ovpns2 verb 1 dev-type tun tun-ipv6 dev-node /dev/tun2 writepid /var/run/openvpn_server2.pid #user nobody #group nobody script-security 3 daemon keepalive 10 60 ping-timer-rem persist-tun persist-key proto udp cipher BF-CBC auth SHA1 up /usr/local/sbin/ovpn-linkup down /usr/local/sbin/ovpn-linkdown local 5.6.7.8 tls-server server 10.1.8.0 255.255.255.0 client-config-dir /var/etc/openvpn-csc/server2 tls-verify "/usr/local/sbin/ovpn_auth_verify tls 'server' 1" lport 1194 management /var/etc/openvpn/server2.sock unix push "route 192.168.3.0 255.255.255.0" client-to-client ca /var/etc/openvpn/server2.ca cert /var/etc/openvpn/server2.cert key /var/etc/openvpn/server2.key dh /etc/dh-parameters.1024 tls-auth /var/etc/openvpn/server2.tls-auth 0 passtos persist-remote-ip float topology subnet
Client:
Quagga ospfd.conf
# This file was created by the pfSense package manager. Do not edit! password *** log syslog interface ovpnc1 ip ospf cost 10 interface ovpnc2 ip ospf cost 20 router ospf ospf router-id 192.168.8.1 log-adjacency-changes detail redistribute connected timers throttle spf 200 2 20 network 10.0.8.0/24 area 0.0.0.1
Quagga zebra.conf
# This file was created by the pfSense package manager. Do not edit! password *** log syslog
sudo cat /var/etc/openvpn/client1.conf
dev ovpnc1 verb 1 dev-type tun tun-ipv6 dev-node /dev/tun1 writepid /var/run/openvpn_client1.pid #user nobody #group nobody script-security 3 daemon keepalive 10 60 ping-timer-rem persist-tun persist-key proto udp cipher BF-CBC auth SHA1 up /usr/local/sbin/ovpn-linkup down /usr/local/sbin/ovpn-linkdown local 3.21.7.21 tls-client client lport 0 management /var/etc/openvpn/client1.sock unix remote 1.2.3.4 1194 route 192.168.3.0 255.255.255.0 ca /var/etc/openvpn/client1.ca cert /var/etc/openvpn/client1.cert key /var/etc/openvpn/client1.key tls-auth /var/etc/openvpn/client1.tls-auth 1 comp-lzo adaptive passtos resolv-retry infinite
sudo cat /var/etc/openvpn/client2.conf
dev ovpnc2 verb 1 dev-type tun tun-ipv6 dev-node /dev/tun2 writepid /var/run/openvpn_client2.pid #user nobody #group nobody script-security 3 daemon keepalive 10 60 ping-timer-rem persist-tun persist-key proto udp cipher BF-CBC auth SHA1 up /usr/local/sbin/ovpn-linkup down /usr/local/sbin/ovpn-linkdown local 3.21.7.21 tls-client client lport 0 management /var/etc/openvpn/client2.sock unix remote 5.6.7.8 1194 route 192.168.3.0 255.255.255.0 ca /var/etc/openvpn/client2.ca cert /var/etc/openvpn/client2.cert key /var/etc/openvpn/client2.key tls-auth /var/etc/openvpn/client2.tls-auth 1 passtos resolv-retry infinite
Updated by Kill Bill almost 8 years ago
https://github.com/pfsense/FreeBSD-ports/pull/265 - that's not a real solution obviously, so kindly leave this bug open even if merged.
Updated by Hanno Stock almost 8 years ago
Looks like Zebra sets RTF_PROTO1 flag on the routes it installs in the routing table.
So I assume in order to get the old behavior it would be needed to flush the routes marked with RTF_PROTO1 in the restart script?
I still think configuration changes are better handled by connecting to the daemons - however in case of a last resort kind of restart this could help.
Updated by Jim Pingle about 7 years ago
If this still happens with Quagga, give FRR a try instead.