Project

General

Profile

Feature #10789

FRR integrated configuration and hitless reloads

Added by Ben Hughes 6 months ago. Updated 5 days ago.

Status:
Feedback
Priority:
Normal
Assignee:
Category:
FRR
Target version:
-
Start date:
07/27/2020
Due date:
% Done:

100%

Estimated time:

Description

Convert FRR to use an integrated configuration file and use frr-reload where possible for hitless configuration changes.

Update frr7 port to 7.4.
Add the frr7-pythontools port to provide frr-reload.py
Refactor the FRR configuration generator to create an integrated configuration file.
3.1. Generated file is in the same format as generated by a vtysh write mem.
Refactor the rc file/generator to use reload unless a service restart is required. (change of enabled daemons).
Accumulate interface descriptions into the FRR configuration for cross-reference to pfSense configuration under vtysh
Include the watchfrr frr watchdog service.

Tested with 2.4.5_p1.

PR: https://github.com/pfsense/FreeBSD-ports/pull/914

History

#1 Updated by Renato Botelho 4 months ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100

PR has been merged only on 2.5.0 branch for now so we can get it properly tested

#2 Updated by Steve Wheeler 4 months ago

After upgrading to todays snap with this change I am seeing this error:

PHP Errors:
[22-Sep-2020 14:13:22 Europe/London] PHP Warning:  file_get_contents(/tmp/frr_start_command_last.txt): failed to open stream: No such file or directory in /usr/local/pkg/frr.inc on line 332

The console hung at 'Writing configuration...' at boot after the update requiring me to Ctl+C past it.

The file /tmp/frr_start_command_last.txt does exist once boot is complete but that error is present at every boot.

This system is running a ram disk for /tmp.

#3 Updated by Jim Pingle 4 months ago

Steve Wheeler wrote:

The console hung at 'Writing configuration...' at boot after the update requiring me to Ctl+C past it.

That is likely still #10610 (not yet solved, but affects multiple packages since it's a problem in pkg)

#4 Updated by Ben Hughes 4 months ago

Steve Wheeler wrote:

After upgrading to todays snap with this change I am seeing this error:
[...]

The console hung at 'Writing configuration...' at boot after the update requiring me to Ctl+C past it.

The file /tmp/frr_start_command_last.txt does exist once boot is complete but that error is present at every boot.

This system is running a ram disk for /tmp.

It's a warning rather than an error, so I'd presume it's #10610 as stated. There is a chicken-and-egg situation with the files existence but as I'm preparing another PR with a few small changes/UI fixups and can wrap this in a file_exists call to stop the warning.

#6 Updated by Jim Pingle 4 months ago

  • Status changed from Feedback to Pull Request Review

#7 Updated by Renato Botelho 4 months ago

  • Status changed from Pull Request Review to Feedback

PR has been merged. Thanks!

#8 Updated by Ben Hughes 4 months ago

Fixed case of the fat fingers in frr_bgp.xml: https://github.com/pfsense/FreeBSD-ports/pull/950

#9 Updated by Jim Pingle 4 months ago

  • Status changed from Feedback to Pull Request Review

#10 Updated by Ben Hughes 4 months ago

Looking into firewalling between two VRFs I've discovered that FRR, contrary to the documentation (http://docs.frrouting.org/en/latest/ospf6d.html#ospf6-area) does actually support OSPFv3 areas (https://github.com/FRRouting/frr/issues/2453) which I wanted to be able to set a separate stub area to only inject a default route in the VRF.

So PR incoming to make it work like OSPFv2.

#12 Updated by Renato Botelho 4 months ago

  • Status changed from Pull Request Review to Feedback

PRs #950 and #955 are now merged. Thanks!

#13 Updated by Chris Evans 4 months ago

I'm still seeing BGP neighbor resets when changes are being made, I believed this effort was to make it so full reloads aren't required?

#14 Updated by Ben Hughes 4 months ago

Chris Evans wrote:

I'm still seeing BGP neighbor resets when changes are being made, I believed this effort was to make it so full reloads aren't required?

Not neccesarily, this was done to stop the FRR processes being restarted on every configuration change. I'm only seeing neighbour resets when changing a setting relevant to the specific neighbour(s), which is most likely due to how FRR/frr_reload.py (re)applies the configuration sections. You can see this running frr_reload in test mode that it sometimes needs to reapply a whole config node due to how it diffs the on disk vs running configuration.

What changes are you making to see this?

The drive to do this for me was more to do with OSPF and not having to wait for SPF to re-run as with normal timers I was getting away with it BGP wise, add BFD into the mix and it's not so easy mind.

#15 Updated by Chris Evans 3 months ago

Ben Hughes wrote:

Chris Evans wrote:

I'm still seeing BGP neighbor resets when changes are being made, I believed this effort was to make it so full reloads aren't required?

Not neccesarily, this was done to stop the FRR processes being restarted on every configuration change. I'm only seeing neighbour resets when changing a setting relevant to the specific neighbour(s), which is most likely due to how FRR/frr_reload.py (re)applies the configuration sections. You can see this running frr_reload in test mode that it sometimes needs to reapply a whole config node due to how it diffs the on disk vs running configuration.

What changes are you making to see this?

The drive to do this for me was more to do with OSPF and not having to wait for SPF to re-run as with normal timers I was getting away with it BGP wise, add BFD into the mix and it's not so easy mind.

I'm just going in and adding/removing a fake neighbor to see if it would cause my valid BGP neighbors to be reset. Unfortunately, it is causing that to happen still.

#16 Updated by Ben Hughes 3 months ago

Chris Evans wrote:

I'm just going in and adding/removing a fake neighbor to see if it would cause my valid BGP neighbors to be reset. Unfortunately, it is causing that to happen still.

That's strange as I can't reproduce that, between two test 2.5.0 dev VMs adding/removing another peer doesn't cause a reset for me and this is with BFD as well.

#17 Updated by Renato Botelho 3 months ago

PR has been merged. Thanks!

#18 Updated by Chris Evans 3 months ago

Ben Hughes wrote:

Chris Evans wrote:

I'm just going in and adding/removing a fake neighbor to see if it would cause my valid BGP neighbors to be reset. Unfortunately, it is causing that to happen still.

That's strange as I can't reproduce that, between two test 2.5.0 dev VMs adding/removing another peer doesn't cause a reset for me and this is with BFD as well.

It appears this happens really only when adding a new neighbor.. I took a packet capture and when I add a new fake random neighbor, FRR sends a BGP open message to the valid peer to rebuild the peering.

Also to add, now that I look more into the packet capture.. I see where FRR is saying where it is sending a peer-deconfigure, then admin shutting down message to the peer and closing the session. Then the BGP open message comes in to re-establish.

Based on the process times, FRR is being restarted, so this is what is to be expected?? My system is showing FRR 7.3.1 and no other updates are available, so I believe I'm on the correct version?

[2.4.5-RELEASE][root@pfsense]/root: ps ax | grep frr
46837 - Ss 0:00.02 /usr/local/sbin/zebra -d -f /var/etc/frr/zebra.conf
47006 - Ss 0:00.15 /usr/local/sbin/bgpd -d -f /var/etc/frr/bgpd.conf
40834 0 S+ 0:00.00 grep frr

[2.4.5-RELEASE][root@pfsense]/root: ps ax | grep frr
47977 - Ss 0:00.00 /usr/local/sbin/zebra -d -f /var/etc/frr/zebra.conf
48244 - Ss 0:00.01 /usr/local/sbin/bgpd -d -f /var/etc/frr/bgpd.conf
48518 0 S+ 0:00.00 grep frr

Aug 12 09:09:59 pfsense zebra83295: Terminating on signal
Aug 12 09:09:59 pfsense zebra83295: release_daemon_table_chunks: Released 0 table chunks
Aug 12 09:09:59 pfsense zebra83295: zebra/zebra_ptm.c:1345 failed to find process pid registration
Aug 12 09:09:59 pfsense zebra83295: client 10 disconnected. 17 bgp routes removed from the rib
Aug 12 09:09:59 pfsense bgpd85000: Terminating on signal
Aug 12 09:09:59 pfsense bgpd85000: %NOTIFICATION: sent to neighbor 172.16.2.6 6/3 (Cease/Peer Unconfigured) 0 bytes
Aug 12 09:09:59 pfsense bgpd85000: %NOTIFICATION: sent to neighbor 172.16.255.1 6/3 (Cease/Peer Unconfigured) 0 bytes
Aug 12 09:09:59 pfsense bgpd85000: %NOTIFICATION: sent to neighbor 172.16.2.6 6/2 (Cease/Administratively Shutdown) 0 bytes
Aug 12 09:09:59 pfsense bgpd85000: %NOTIFICATION: sent to neighbor 172.16.255.1 6/2 (Cease/Administratively Shutdown) 0 bytes
Aug 12 09:09:59 pfsense bgpd85000: %ADJCHANGE: neighbor 172.16.2.6(Unknown) in vrf default Down Neighbor deleted
Aug 12 09:09:59 pfsense zebra83295: release_daemon_table_chunks: Released 0 table chunks
Aug 12 09:09:59 pfsense zebra83295: client 26 disconnected. 0 vnc routes removed from the rib
Aug 12 09:09:59 pfsense bgpd85000: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
Aug 12 09:09:59 pfsense zebra83295: Zebra final shutdown
Aug 12 09:09:59 pfsense bgpd85000: %ADJCHANGE: neighbor 172.16.255.1(Unknown) in vrf default Down Neighbor deleted
Aug 12 09:09:59 pfsense bgpd85000: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed

#19 Updated by Ben Hughes 3 months ago

Just a thought, which version of pfSense is this with? And which version of the FRR plugin is installed? It should be 0.6.8_5 for this to be included.

These changes have only been merged to the devel branch of the pfSense version of the port tree, so unless you're running a 2.5.0 dev build you'll need to manually download and install the relevant packages via SSH as I don't believe there is a way to jump ahead via the GUI.

You'll need the lastest frr7, frr7-pythontools and pfSense-pkg-frr packages.

#20 Updated by Chris Evans 3 months ago

Ben Hughes wrote:

Just a thought, which version of pfSense is this with? And which version of the FRR plugin is installed? It should be 0.6.8_5 for this to be included.

These changes have only been merged to the devel branch of the pfSense version of the port tree, so unless you're running a 2.5.0 dev build you'll need to manually download and install the relevant packages via SSH as I don't believe there is a way to jump ahead via the GUI.

You'll need the lastest frr7, frr7-pythontools and pfSense-pkg-frr packages.

OK, that's my problem for sure then!! Something wasn't adding up... When I saw this was 'complete' I had assumed it meant to the mainstream release..

#21 Updated by Ben Hughes 3 months ago

Always like a nice easy fix!

I've only just starting doing anything with pfsense dev wise but from github it seems that devel gets merged to RELENG_x_x_x once the feature has been tested enough so feel free to grab the devel version and test it out.

I've got another PR in the works to tidy up the BGP neighbor family generation some more but it's fully workable as is with 2.4.5 and it'd be nice to get more eyes on it to work out any other lurking bugs.

You should be able to download from here: https://beta.pfsense.org/packages/pfSense_master_amd64-pfSense_devel/All/

#22 Updated by Chris Evans 3 months ago

Ben Hughes wrote:

Always like a nice easy fix!

I've only just starting doing anything with pfsense dev wise but from github it seems that devel gets merged to RELENG_x_x_x once the feature has been tested enough so feel free to grab the devel version and test it out.

I've got another PR in the works to tidy up the BGP neighbor family generation some more but it's fully workable as is with 2.4.5 and it'd be nice to get more eyes on it to work out any other lurking bugs.

You should be able to download from here: https://beta.pfsense.org/packages/pfSense_master_amd64-pfSense_devel/All/

Which package do I need to update to get this functioning? There's a few json packages, but I don't want to install a random one and break things :)

[2.4.5-RELEASE][root@pfsense]/root: vtysh
/usr/local/lib/libjson-c.so.5: version JSONC_0.14 required by /usr/local/lib/libfrr.so.0 not defined

#23 Updated by Ben Hughes 3 months ago

pkg install json-c should do the job, 0.14 is in the 2.4.5 repos. I have expected pkg to have picked that up itself as it's in the package dependancies.

#24 Updated by Jim Pingle 3 months ago

Do not install packages across versions like that. Either upgrade to 2.5.0 completely or wait for it to be merged. Anything else is an invalid test and is unlikely to work properly.

#25 Updated by Ben Hughes 3 months ago

Yes, Jim's right. I'm forgetting i've build my test packages for 2.4.5 on a FreeBSD 11 build VM with matching libraries hence the dependancy problems with the devel ones.

#26 Updated by Chris Evans 3 months ago

Ben Hughes wrote:

Yes, Jim's right. I'm forgetting i've build my test packages for 2.4.5 on a FreeBSD 11 build VM with matching libraries hence the dependancy problems with the devel ones.

Yeap, I'm going to agree here... I went back to the older versions..

Thanks!

#27 Updated by Ben Hughes 3 months ago

Found a few more things to fixup: https://github.com/pfsense/FreeBSD-ports/pull/970

#28 Updated by Jim Pingle 3 months ago

  • Status changed from Feedback to Pull Request Review

#29 Updated by Viktor Gurov 5 days ago

  • Status changed from Pull Request Review to Feedback

Merged

Also available in: Atom PDF