Project

General

Profile

Bug #6305

Quagga problems updating routes / mistakenly showing "kernel"-routes while they are not

Added by jeroen van breedam over 3 years ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Quagga OSPF
Target version:
-
Start date:
05/03/2016
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.3
Affected Architecture:

History

#1 Updated by Chris Buechler over 3 years ago

  • Project changed from pfSense to pfSense Packages
  • Category set to Quagga OSPF
  • Target version deleted (2.3.1)

#2 Updated by jeroen van breedam over 3 years ago

for me, downgrading to older version seems to solve all issues. no confirmation if this is the case for the OP of the forum post.

working version: http://pkg.freebsd.org/freebsd:10:x86:64/release_3/All/quagga-0.99.24.1_2.txz

#3 Updated by jeroen van breedam over 3 years ago

OP hasn't found the time to respond to the post.
a different forum member has confirmed reverting to version above solves it.

Could anyone revert the update to before 1.x ?

#4 Updated by jeroen van breedam over 3 years ago

A different forum member has came across this issue & has confirmed that reverting to 0.99.24.1 fixes the problem.

#5 Updated by jeroen van breedam over 3 years ago

Any of the coredevs been able to replicate this?

#6 Updated by Reqlez Guy over 3 years ago

jeroen van breedam wrote:

Any of the coredevs been able to replicate this?

I have ( but I'm not a Dev) And everybody else who actually uses OSPF and tried to failover links has.

This is a core functionality bug and it needs more attention.

#7 Updated by Jim Pingle over 3 years ago

  • Status changed from New to Feedback

I see the routes sometimes (but not always) marked as Kernel routes in the Zebra routing table, but I have not seen this be an actual problem for routing. So it's possible the K routes are not actually the source of your problems.

I have a test setup here with a central router, and two clients that each have two WANs, so four OpenVPN instances total on the "server" (two each on clients), and each of them have OSPF. If I kill a preferred WAN on a client, the K route goes inactive, quagga selects the O route and traffic flows again. When it recovers, it continues to work as well, but with the expected traffic hiccup as OSPF switches things around.

It's possible there is some misconfiguration happening that is handled differently in quagga 1.0.x vs 0.99.x, but as far as I can see, it works when configured properly. Continue the discussion on the forum thread but please post the contents of the zebra.conf and quagga.conf files (masking/removing passwords), and preferably post the entire output of the status tab in an attachment on the forum thread.

#8 Updated by jeroen van breedam over 3 years ago

Jim Pingle

i've updated reply #7 to include the config of client side & server side

the status of before/after was there already.
I'm currently not in a position to dump the entire contents of status before/after because employees&bosses don't like it much when things stop working =)

Atleast 4 people have reported similar issue's. It's odd that it doesn't occur in the test-setup - we must be missing something here ...

If you need more data i'll try to provide it.

#9 Updated by jeroen van breedam over 3 years ago

seems like someone has found a way to reproduce consistenly. (this is currently not verified by others)

https://forum.pfsense.org/index.php?topic=111108.msg630396#msg630396

#10 Updated by Reqlez Guy about 3 years ago

Okay ... I have to set-up already that if i upgrade the package back to the new one, the issue will happen. Jim ... can I just privately send you the config files of the routers somehow and you can just take a look ? or what info do you need that i can safely post on this bug tracker ?

#11 Updated by Juri Dmitrijev about 3 years ago

Any update on the topic?

#12 Updated by Jim Pingle about 3 years ago

Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.

We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.

#13 Updated by Reqlez Guy almost 3 years ago

Jim Pingle wrote:

Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.

We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.

I emailed the quagga users list and got a response https://lists.quagga.net/pipermail/quagga-users/2016-October/014474.html

Above is the thread regarding this. Also ... I know Jim Pingle have provided a "no routing packages restart" patch for 2.3.1 ... but ... every time we update pfsense this is not going to work ... is it possible for this "no routing packages restart" to be made into an option under advenced settings in pfsense ? I have an issue with unstable links that bring down the network even if those links are lower priority because it seems that every time zebra gets rebooted the routes are wiped out and there is a period of a few seconds where there is no traffic while zebra restarts and learns the routes again, very annoying.

#14 Updated by Nate Baker almost 3 years ago

Jim Pingle wrote:

Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.

We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.

We are having this issue as well. It looks like to reproduce it, the quagga services (probably just zebra) need to be restarted. When that happens the kernel routes show up in the zebra routes, and from that point on things don't work properly. So it seems like there are two problems:

1) Every time a change is made to OSPF the services are restarted with the new config. This can be disruptive, and the Quagga team says it shouldn't be necessary. Also it triggers the problem with Quagga.
2) When Quagga is restarted, the kernel routes (which it put there before it was restarted) are pulled into Zebra, and will always take precedence until the firewall is restarted.

If we restart the firewall and never touch the Quagga settings things work fine. So to fix number 1, is it possible to write the configuration files and change the Quagga configuration by connecting to the Quagga VTYs, instead of restarting it? It seems like number 2 needs to be fixed by Quagga.

#15 Updated by Reqlez Guy almost 3 years ago

Nate Baker wrote:

Jim Pingle wrote:

Someone who can reproduce it reliably needs to get the details of how to reproduce it reported to the Quagga project directly.

We can't reproduce it reliably here, and it does not appear to be a bug in any of our code, but in the current version of Quagga on FreeBSD.

We are having this issue as well. It looks like to reproduce it, the quagga services (probably just zebra) need to be restarted. When that happens the kernel routes show up in the zebra routes, and from that point on things don't work properly. So it seems like there are two problems:

1) Every time a change is made to OSPF the services are restarted with the new config. This can be disruptive, and the Quagga team says it shouldn't be necessary. Also it triggers the problem with Quagga.
2) When Quagga is restarted, the kernel routes (which it put there before it was restarted) are pulled into Zebra, and will always take precedence until the firewall is restarted.

If we restart the firewall and never touch the Quagga settings things work fine. So to fix number 1, is it possible to write the configuration files and change the Quagga configuration by connecting to the Quagga VTYs, instead of restarting it? It seems like number 2 needs to be fixed by Quagga.

I'm working with Martin from Quagga and collecting debug logs this weekend. He thinks that "Quagga is being restarted in some hard way that won't allow it to clean up routes" but he also says he doesnt understand why Quagga needs to be restarted in pfsense in the first place when links change.

#16 Updated by Reqlez Guy almost 3 years ago

So far the only thing I got from Martin was that -9 is not a nice way to stop quagga and could cause the issues... Also I saw 1.1 release of quagga had this commit ... not sure if this is related:

commit 7e73eb740f3c52a5b7c0ae9c2cd33b486d885552
Author: Timo Teräs <>
Date: Sat Apr 9 17:22:32 2016 +0300

zebra: handle multihop nexthop changes properly
The rib entries are normally added and deleted when they are
changed. However, they are modified in placae when the nexthop
reachability changes. This fixes to:
- properly detect nexthop changes from nexthop_active_update()
calls from rib_process()
- rib_update_kernel() to not reset FIB flags when a RIB entry
is being modifed (old and new RIB are same)
- improves the "show ip route
" output to display
       both ACTIVE and FIB flags for each nexthop

    Fixes: 325823a5 "zebra: support FIB override routes" 
    Signed-off-by: Timo Teräs <timo.teras@iki.fi>
    Reported-By: Igor Ryzhov <iryzhov@nfware.com>
    Tested-by: NetDEF CI System <cisystem@netdef.org>

#17 Updated by winmasta winmasta over 2 years ago

Affected me too. I tried settings with OpenVPN server + OpenVPN client.

Both:
Pfsense 2.3.2-RELEASE-p1
Quagga_OSPF 0.6.16 (quagga-1.0.20160315)

Server:

Quagga ospfd.conf

# This file was created by the pfSense package manager.  Do not edit!

password ***
log syslog
interface ovpns1
  ip ospf cost 10
interface ovpns2
  ip ospf cost 20

router ospf
  ospf router-id 192.168.3.7
  log-adjacency-changes detail
  redistribute connected
  network 10.0.8.0/24 area 0.0.0.1


Quagga zebra.conf
# This file was created by the pfSense package manager.  Do not edit!

password ***
log syslog

sudo cat /var/etc/openvpn/server1.conf

dev ovpns1
verb 1
dev-type tun
tun-ipv6
dev-node /dev/tun1
writepid /var/run/openvpn_server1.pid
#user nobody
#group nobody
script-security 3
daemon
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
proto udp
cipher BF-CBC
auth SHA1
up /usr/local/sbin/ovpn-linkup
down /usr/local/sbin/ovpn-linkdown
local 1.2.3.4
tls-server
server 10.0.8.0 255.255.255.0
client-config-dir /var/etc/openvpn-csc/server1
tls-verify "/usr/local/sbin/ovpn_auth_verify tls 'server' 1" 
lport 1194
management /var/etc/openvpn/server1.sock unix
push "route 192.168.3.0 255.255.255.0" 
client-to-client
ca /var/etc/openvpn/server1.ca
cert /var/etc/openvpn/server1.cert
key /var/etc/openvpn/server1.key
dh /etc/dh-parameters.1024
tls-auth /var/etc/openvpn/server1.tls-auth 0
comp-lzo adaptive
passtos
persist-remote-ip
float
topology subnet
route 192.168.1.0 255.255.255.0 10.0.8.1
 route 192.168.0.0 255.255.255.0 10.0.8.1
 route 192.168.5.0 255.255.255.0 10.0.8.1
 route 192.168.8.0 255.255.255.0 10.0.8.1
 route 192.168.9.0 255.255.255.0 10.0.8.1
 route 192.168.10.0 255.255.255.0 10.0.8.1

sudo cat /var/etc/openvpn/server2.conf

dev ovpns2
verb 1
dev-type tun
tun-ipv6
dev-node /dev/tun2
writepid /var/run/openvpn_server2.pid
#user nobody
#group nobody
script-security 3
daemon
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
proto udp
cipher BF-CBC
auth SHA1
up /usr/local/sbin/ovpn-linkup
down /usr/local/sbin/ovpn-linkdown
local 5.6.7.8
tls-server
server 10.1.8.0 255.255.255.0
client-config-dir /var/etc/openvpn-csc/server2
tls-verify "/usr/local/sbin/ovpn_auth_verify tls 'server' 1" 
lport 1194
management /var/etc/openvpn/server2.sock unix
push "route 192.168.3.0 255.255.255.0" 
client-to-client
ca /var/etc/openvpn/server2.ca
cert /var/etc/openvpn/server2.cert
key /var/etc/openvpn/server2.key
dh /etc/dh-parameters.1024
tls-auth /var/etc/openvpn/server2.tls-auth 0
passtos
persist-remote-ip
float
topology subnet

Client:

Quagga ospfd.conf


# This file was created by the pfSense package manager.  Do not edit!

password ***
log syslog
interface ovpnc1
  ip ospf cost 10
interface ovpnc2
  ip ospf cost 20

router ospf
  ospf router-id 192.168.8.1
  log-adjacency-changes detail
  redistribute connected
  timers throttle spf 200 2 20
  network 10.0.8.0/24 area 0.0.0.1

Quagga zebra.conf

# This file was created by the pfSense package manager.  Do not edit!

password ***
log syslog

sudo cat /var/etc/openvpn/client1.conf

dev ovpnc1
verb 1
dev-type tun
tun-ipv6
dev-node /dev/tun1
writepid /var/run/openvpn_client1.pid
#user nobody
#group nobody
script-security 3
daemon
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
proto udp
cipher BF-CBC
auth SHA1
up /usr/local/sbin/ovpn-linkup
down /usr/local/sbin/ovpn-linkdown
local 3.21.7.21
tls-client
client
lport 0
management /var/etc/openvpn/client1.sock unix
remote 1.2.3.4 1194
route 192.168.3.0 255.255.255.0
ca /var/etc/openvpn/client1.ca
cert /var/etc/openvpn/client1.cert
key /var/etc/openvpn/client1.key
tls-auth /var/etc/openvpn/client1.tls-auth 1
comp-lzo adaptive
passtos
resolv-retry infinite

sudo cat /var/etc/openvpn/client2.conf

dev ovpnc2
verb 1
dev-type tun
tun-ipv6
dev-node /dev/tun2
writepid /var/run/openvpn_client2.pid
#user nobody
#group nobody
script-security 3
daemon
keepalive 10 60
ping-timer-rem
persist-tun
persist-key
proto udp
cipher BF-CBC
auth SHA1
up /usr/local/sbin/ovpn-linkup
down /usr/local/sbin/ovpn-linkdown
local 3.21.7.21
tls-client
client
lport 0
management /var/etc/openvpn/client2.sock unix
remote 5.6.7.8 1194
route 192.168.3.0 255.255.255.0
ca /var/etc/openvpn/client2.ca
cert /var/etc/openvpn/client2.cert
key /var/etc/openvpn/client2.key
tls-auth /var/etc/openvpn/client2.tls-auth 1
passtos
resolv-retry infinite

#18 Updated by Kill Bill over 2 years ago

https://github.com/pfsense/FreeBSD-ports/pull/265 - that's not a real solution obviously, so kindly leave this bug open even if merged.

#19 Updated by Hanno Stock over 2 years ago

Looks like Zebra sets RTF_PROTO1 flag on the routes it installs in the routing table.

So I assume in order to get the old behavior it would be needed to flush the routes marked with RTF_PROTO1 in the restart script?

I still think configuration changes are better handled by connecting to the daemons - however in case of a last resort kind of restart this could help.

#20 Updated by Jim Pingle almost 2 years ago

If this still happens with Quagga, give FRR a try instead.

#21 Updated by Jim Pingle about 1 month ago

  • Status changed from Feedback to Closed

Also available in: Atom PDF