Project

General

Profile

Bug #6850

Bug #6828: Patch for "route change" is not present on 2.4 builds using FreeBSD 11

FreeBSD 11.0 Route Syntax Change For Non-Local Gateway

Added by Ken Sim 12 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Gateways
Target version:
Start date:
10/11/2016
Due date:
% Done:

100%

Affected version:
2.4
Affected Architecture:

Description

Upon testing out one of the 2.4 snapshots a few weeks ago, I was unable to get network connectivity with a gateway that is outside the wan subnet. After some research and help from the FreeBSD community, it would seem that the route syntax for non-local gateways has changed in FreeBSD 11.0

Working commands on 10.3:

/sbin/route add -net <gateway ip>/32 <wan/failover ip>
/sbin/route add default <gateway ip>

Working commands on 11.0:

/sbin/route add -net <gateway ip>/32 -iface <wan interface>
/sbin/route add default <gateway ip>

As you can see the slight change in the syntax for 11.0, if this change could be updated in pfSense 2.4 to allow the "Use non-local gateway" option to work again, would be greatly appreciated.

Thanks

redmine-6850-config-pfSense.localdomain-20161207160722.xml Magnifier (13.2 KB) Jim Pingle, 12/07/2016 10:08 AM

History

#1 Updated by Jim Pingle 12 months ago

  • Parent task set to #6828

It's also possible you were hitting #6828 which needs to be solved first before other routing issues.

#2 Updated by Ken Sim 12 months ago

I am not sure if this is related or not, all I know is on 2.4 the option does not work, and the only way I am able to get ipv4 connectivity is manually entering the 2 route commands above for FreeBSD 11.0.

#3 Updated by Renato Botelho 11 months ago

  • Status changed from New to Feedback
  • Assignee set to Renato Botelho

I couldn't replicate it after fixes I pushed for #6828. Can you try the next round of snapshots?

#4 Updated by Ken Sim 11 months ago

Anytime I try and change any of the gateways that are checked non-local on the current snapshot it locks up pfsense after 2-3 min, after rebooting the vm the settings do stick and it does work now w/o having to manually enter the route commands, just not sure why the whole thing locks up. I can't find anything anywhere to indicate why it's locking up, nothing in the logs or anything.

#5 Updated by Renato Botelho 11 months ago

Ken Sim wrote:

Anytime I try and change any of the gateways that are checked non-local on the current snapshot it locks up pfsense after 2-3 min, after rebooting the vm the settings do stick and it does work now w/o having to manually enter the route commands, just not sure why the whole thing locks up. I can't find anything anywhere to indicate why it's locking up, nothing in the logs or anything.

I was able to reproduce this issue, I'll take a look.

#6 Updated by Renato Botelho 11 months ago

  • Status changed from Feedback to Confirmed

#7 Updated by Ken Sim 11 months ago

Still seeing system lockup on 2.4.0-BETA when dealing with non-local gateways.

#8 Updated by Renato Botelho 10 months ago

  • Status changed from Confirmed to Feedback

Ken Sim wrote:

Still seeing system lockup on 2.4.0-BETA when dealing with non-local gateways.

I've tried to reproduce it again on today's snapshot without success. Can you please upgrade your system and try again? If you can reproduce, can you please provide necessary steps to reproduce just to check with what I'm doing here and see if I can do it.

I have a system with lots of debugs to collect more details if the problem happens and I would need to reproduce it here to be able to fix

Thanks

#9 Updated by Ken Sim 10 months ago

Still seeing some issues, if I edit anything with the local-gateway even just the description and click apply changes, the web ui hangs, can't click on anything else it's done. Restarting the php-fpm service does not resolve. Restarting the webconfigurator does not resolve. The only way I am able to get the web ui to respond again is with a reboot. The system no longer hangs just the web ui. I take that back, after waiting the system is still locking up, seems to take up to 5 minutes of the web ui hung before the system fully locks up. I am not sure what else to provide to you as I haven't done anything but try and edit the description of the current non-local gateway. If you need any information from me just let me know and I will happily provide it to get this issue resolved, thanks.

EDIT: pfSense version: 2.4.0.b.20161118.0635

#10 Updated by Ken Sim 10 months ago

After going into System -> Routing -> Gateways, clicking edit on the current gateway outside the subnet, don't even have to edit anything, click save, then apply. Web UI crashes 1 min after clicking apply, and a full system lockup within 2 min. The lockup is happening faster and faster, at this point you guys might as well just rip out this option, no one seems to be working on it or care enough to try and resolve it. Rip it out and let everyone go back to using shellcmd because this is getting ridiculous, even for alpha/beta software.

Current Version: 2.4.0.b.20161123.1004

#11 Updated by Jim Pingle 10 months ago

  • Status changed from Feedback to Confirmed

I was finally able to reproduce this reliably today, and out of 5 failures once I was able to catch what was consuming CPU.

When it happens, network access is cut off and more often than not, the console is unresponsive.

Config:

Client set for a WAN IP address of 192.0.2.55/32, gateway of 198.51.100.1 marked as default with non-local gateway set:

        <gateways>
                <gateway_item>
                        <interface>wan</interface>
                        <gateway>198.51.100.1</gateway>
                        <name>WANGW</name>
                        <weight>1</weight>
                        <ipprotocol>inet</ipprotocol>
                        <descr></descr>
                        <nonlocalgateway></nonlocalgateway>
                        <defaultgw></defaultgw>
                </gateway_item>
        </gateways>

The routing and ARP tables look as expected

: netstat -rn4
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            198.51.100.1       UGS         em0
127.0.0.1          link#4             UH          lo0
192.0.2.55         link#1             UHS         lo0
192.0.2.55/32      link#1             U           em0
192.168.1.0/24     link#2             U           em1
192.168.1.1        link#2             UHS         lo0
198.51.100.1       00:0c:xx:xx:xx:7b  UHS         em0
: arp -na | grep 198.51
? (198.51.100.1) at 00:90:xx:xx:xx:24 on em0 expires in 624 seconds [ethernet]

On the upstream box (also pfSense 2.4), I added settings manually to allow it to work (not pictured, also passed in firewall rules and has outbound NAT):

/sbin/route add -net 192.0.2.55/32 -iface igb0_vlan40
: netstat -rWn4 | grep '192.0.2'
192.0.2.55/32      00:90:xx:xx:xx:24  US        81633   1500 igb0_vlan40
: arp -na | grep '192.0.2'
? (192.0.2.55) at 00:0c:xx:xx:xx:7b on igb0_vlan40 expires in 1158 seconds [vlan]

If I leave it alone it'll stay up indefinitely, clients behind can access the Internet and there are no problems. The moment I make a change to the gateway entry and press 'apply' on the page, it drops off the network and the console goes unresponsive.

The one time I still had a responsive console for a couple moments, I could see that a "route change" process was consuming large amounts of CPU:

root    98227 95.2  0.2   8220  1980  -  R    15:35    0:23.15 /sbin/route change -inet default 198.51.100.1

Running that command by hand will reproduce the lockup each time. I ran truss on the output and it stops at the same place each time:

: truss -f /sbin/route change -inet default 198.51.100.1
54278: mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366181376 (0x800625000)
54278: issetugid()                 = 0 (0x0)
54278: lstat("/etc",{ mode=drwxr-xr-x ,inode=1605120,size=4096,blksize=32768 }) = 0 (0x0)
54278: lstat("/etc/libmap.conf",{ mode=-rw-r--r-- ,inode=1606014,size=47,blksize=32768 }) = 0 (0x0)
54278: openat(AT_FDCWD,"/etc/libmap.conf",O_CLOEXEC,00) = 3 (0x3)
54278: fstat(3,{ mode=-rw-r--r-- ,inode=1606014,size=47,blksize=32768 }) = 0 (0x0)
54278: mmap(0x0,47,PROT_READ,MAP_PRIVATE,3,0x0)     = 34366214144 (0x80062d000)
54278: close(3)                     = 0 (0x0)
54278: lstat("/usr",{ mode=drwxr-xr-x ,inode=1364352,size=512,blksize=32768 }) = 0 (0x0)
54278: lstat("/usr/local",{ mode=drwxr-xr-x ,inode=1364356,size=512,blksize=32768 }) = 0 (0x0)
54278: lstat("/usr/local/etc",{ mode=drwxr-xr-x ,inode=1685386,size=1536,blksize=32768 }) = 0 (0x0)
54278: lstat("/usr/local/etc/libmap.d",0x7fffffffcac8) ERR#2 'No such file or directory'
54278: munmap(0x80062d000,47)             = 0 (0x0)
54278: openat(AT_FDCWD,"/var/run/ld-elf.so.hints",O_CLOEXEC,00) = 3 (0x3)
54278: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0f\0\0\0\0"...,128) = 128 (0x80)
54278: fstat(3,{ mode=-r--r--r-- ,inode=6,size=230,blksize=32768 }) = 0 (0x0)
54278: lseek(3,0x80,SEEK_SET)             = 128 (0x80)
54278: read(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,102) = 102 (0x66)
54278: close(3)                     = 0 (0x0)
54278: access("/lib/libc.so.7",F_OK)         = 0 (0x0)
54278: openat(AT_FDCWD,"/lib/libc.so.7",O_CLOEXEC|O_VERIFY,00) = 3 (0x3)
54278: fstat(3,{ mode=-r--r--r-- ,inode=321144,size=1587696,blksize=32768 }) = 0 (0x0)
54278: mmap(0x0,4096,PROT_READ,MAP_PRIVATE|MAP_PREFAULT_READ,3,0x0) = 34366214144 (0x80062d000)
54278: mmap(0x0,3784704,PROT_NONE,MAP_PRIVATE|MAP_ANON|MAP_NOCORE,-1,0x0) = 34368286720 (0x800827000)
54278: mmap(0x800827000,1540096,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE|MAP_PREFAULT_READ,3,0x0) = 34368286720 (0x800827000)
54278: mmap(0x800b9f000,45056,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_PREFAULT_READ,3,0x178000) = 34371923968 (0x800b9f000)
54278: mmap(0x800baa000,102400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_ANON,-1,0x0) = 34371969024 (0x800baa000)
54278: munmap(0x80062d000,4096)             = 0 (0x0)
54278: close(3)                     = 0 (0x0)
54278: munmap(0x80062c000,4096)             = 0 (0x0)
54278: mmap(0x0,102400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366210048 (0x80062c000)
54278: sysarch(AMD64_SET_FSBASE,0x7fffffffe498)     = 0 (0x0)
54278: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
54278: sigprocmask(SIG_SETMASK,{ },0x0)         = 0 (0x0)
54278: readlink("/etc/malloc.conf",0x7fffffffdb90,1024) ERR#2 'No such file or directory'
54278: issetugid()                 = 0 (0x0)
54278: __sysctl(0x7fffffffda00,0x2,0x7fffffffda50,0x7fffffffda48,0x800973a13,0xd) = 0 (0x0)
54278: __sysctl(0x7fffffffda50,0x2,0x7fffffffdb14,0x7fffffffdb08,0x0,0x0) = 0 (0x0)
54278: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34372071424 (0x800bc3000)
54278: munmap(0x800bc3000,2097152)         = 0 (0x0)
54278: mmap(0x0,4190208,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34372071424 (0x800bc3000)
54278: munmap(0x800bc3000,249856)         = 0 (0x0)
54278: munmap(0x800e00000,1843200)         = 0 (0x0)
54278: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
54278: sigprocmask(SIG_SETMASK,{ },0x0)         = 0 (0x0)
54278: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
54278: sigprocmask(SIG_SETMASK,{ },0x0)         = 0 (0x0)
54278: getpid()                     = 54278 (0xd406)
54278: geteuid()                 = 0 (0x0)
54278: socket(PF_ROUTE,SOCK_RAW,0)         = 3 (0x3)
54278: __sysctl(0x7fffffffe100,0x2,0x7fffffffe150,0x7fffffffe148,0x405cc2,0x8) = 0 (0x0)
54278: __sysctl(0x7fffffffe150,0x2,0x606dec,0x7fffffffe250,0x0,0x0) = 0 (0x0)
54278: __sysctl(0x7fffffffe100,0x2,0x7fffffffe150,0x7fffffffe148,0x405ccb,0xd) = 0 (0x0)
54278: __sysctl(0x7fffffffe150,0x2,0x606df0,0x7fffffffe250,0x0,0x0) = 0 (0x0)
54278: shutdown(3,SHUT_RD)             = 0 (0x0)
54278: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34374418432 (0x800e00000)
54278: setsockopt(0x3,0xffff,0x1014,0x7fffffffe260,0x4) = 0 (0x0)

The console will only respond to Ctrl-T at that point, which prints a variation of this (CPU usage varies, time increases):

load: 1.31  cmd: route 97129 [runnable] 33.60r 0.00u 33.50s 100% 1964k

#12 Updated by Jim Pingle 10 months ago

Full config attached, but it's nothing special - default config + static address on WAN + off-subnet gateway.

#14 Updated by Renato Botelho 10 months ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100

Luiz pushed a fix for that deadlock. Next round of 2.4.0 snapshots will have it applied so we can test

https://github.com/pfsense/FreeBSD-src/commit/4627301691bb818abae4e82bda1a5ef38d52a68f

#15 Updated by Jim Pingle 10 months ago

  • Assignee changed from Renato Botelho to Ken Sim

It works well for me now, I can run the route command by hand and also apply settings in the GUI. Assigning back to the original reporter for more feedback.

#16 Updated by Ken Sim 10 months ago

Everything seems to be working as expected now with that patch applied. I have played around with the gateways for about an hour trying all sorts of different things and it reacted as it should without any sort of delay or lockup. Thanks for all the help getting this issue resolved, I know myself and I am sure others who use the feature are grateful for the help.

#17 Updated by Jim Pingle 10 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF