Bug #6850
closedBug #6828: Patch for "route change" is not present on 2.4 builds using FreeBSD 11
FreeBSD 11.0 Route Syntax Change For Non-Local Gateway
100%
Description
Upon testing out one of the 2.4 snapshots a few weeks ago, I was unable to get network connectivity with a gateway that is outside the wan subnet. After some research and help from the FreeBSD community, it would seem that the route syntax for non-local gateways has changed in FreeBSD 11.0
Working commands on 10.3:
/sbin/route add -net <gateway ip>/32 <wan/failover ip> /sbin/route add default <gateway ip>
Working commands on 11.0:
/sbin/route add -net <gateway ip>/32 -iface <wan interface> /sbin/route add default <gateway ip>
As you can see the slight change in the syntax for 11.0, if this change could be updated in pfSense 2.4 to allow the "Use non-local gateway" option to work again, would be greatly appreciated.
Thanks
Files
Updated by Jim Pingle about 8 years ago
- Parent task set to #6828
It's also possible you were hitting #6828 which needs to be solved first before other routing issues.
Updated by Ken Sim about 8 years ago
I am not sure if this is related or not, all I know is on 2.4 the option does not work, and the only way I am able to get ipv4 connectivity is manually entering the 2 route commands above for FreeBSD 11.0.
Updated by Renato Botelho about 8 years ago
- Status changed from New to Feedback
- Assignee set to Renato Botelho
I couldn't replicate it after fixes I pushed for #6828. Can you try the next round of snapshots?
Updated by Ken Sim about 8 years ago
Anytime I try and change any of the gateways that are checked non-local on the current snapshot it locks up pfsense after 2-3 min, after rebooting the vm the settings do stick and it does work now w/o having to manually enter the route commands, just not sure why the whole thing locks up. I can't find anything anywhere to indicate why it's locking up, nothing in the logs or anything.
Updated by Renato Botelho about 8 years ago
Ken Sim wrote:
Anytime I try and change any of the gateways that are checked non-local on the current snapshot it locks up pfsense after 2-3 min, after rebooting the vm the settings do stick and it does work now w/o having to manually enter the route commands, just not sure why the whole thing locks up. I can't find anything anywhere to indicate why it's locking up, nothing in the logs or anything.
I was able to reproduce this issue, I'll take a look.
Updated by Renato Botelho about 8 years ago
- Status changed from Feedback to Confirmed
Updated by Ken Sim about 8 years ago
Still seeing system lockup on 2.4.0-BETA when dealing with non-local gateways.
Updated by Renato Botelho about 8 years ago
- Status changed from Confirmed to Feedback
Ken Sim wrote:
Still seeing system lockup on 2.4.0-BETA when dealing with non-local gateways.
I've tried to reproduce it again on today's snapshot without success. Can you please upgrade your system and try again? If you can reproduce, can you please provide necessary steps to reproduce just to check with what I'm doing here and see if I can do it.
I have a system with lots of debugs to collect more details if the problem happens and I would need to reproduce it here to be able to fix
Thanks
Updated by Ken Sim about 8 years ago
Still seeing some issues, if I edit anything with the local-gateway even just the description and click apply changes, the web ui hangs, can't click on anything else it's done. Restarting the php-fpm service does not resolve. Restarting the webconfigurator does not resolve. The only way I am able to get the web ui to respond again is with a reboot. The system no longer hangs just the web ui. I take that back, after waiting the system is still locking up, seems to take up to 5 minutes of the web ui hung before the system fully locks up. I am not sure what else to provide to you as I haven't done anything but try and edit the description of the current non-local gateway. If you need any information from me just let me know and I will happily provide it to get this issue resolved, thanks.
EDIT: pfSense version: 2.4.0.b.20161118.0635
Updated by Ken Sim about 8 years ago
After going into System -> Routing -> Gateways, clicking edit on the current gateway outside the subnet, don't even have to edit anything, click save, then apply. Web UI crashes 1 min after clicking apply, and a full system lockup within 2 min. The lockup is happening faster and faster, at this point you guys might as well just rip out this option, no one seems to be working on it or care enough to try and resolve it. Rip it out and let everyone go back to using shellcmd because this is getting ridiculous, even for alpha/beta software.
Current Version: 2.4.0.b.20161123.1004
Updated by Jim Pingle about 8 years ago
- Status changed from Feedback to Confirmed
I was finally able to reproduce this reliably today, and out of 5 failures once I was able to catch what was consuming CPU.
When it happens, network access is cut off and more often than not, the console is unresponsive.
Config:
Client set for a WAN IP address of 192.0.2.55/32, gateway of 198.51.100.1 marked as default with non-local gateway set:
<gateways> <gateway_item> <interface>wan</interface> <gateway>198.51.100.1</gateway> <name>WANGW</name> <weight>1</weight> <ipprotocol>inet</ipprotocol> <descr></descr> <nonlocalgateway></nonlocalgateway> <defaultgw></defaultgw> </gateway_item> </gateways>
The routing and ARP tables look as expected
: netstat -rn4 Routing tables Internet: Destination Gateway Flags Netif Expire default 198.51.100.1 UGS em0 127.0.0.1 link#4 UH lo0 192.0.2.55 link#1 UHS lo0 192.0.2.55/32 link#1 U em0 192.168.1.0/24 link#2 U em1 192.168.1.1 link#2 UHS lo0 198.51.100.1 00:0c:xx:xx:xx:7b UHS em0 : arp -na | grep 198.51 ? (198.51.100.1) at 00:90:xx:xx:xx:24 on em0 expires in 624 seconds [ethernet]
On the upstream box (also pfSense 2.4), I added settings manually to allow it to work (not pictured, also passed in firewall rules and has outbound NAT):
/sbin/route add -net 192.0.2.55/32 -iface igb0_vlan40 : netstat -rWn4 | grep '192.0.2' 192.0.2.55/32 00:90:xx:xx:xx:24 US 81633 1500 igb0_vlan40 : arp -na | grep '192.0.2' ? (192.0.2.55) at 00:0c:xx:xx:xx:7b on igb0_vlan40 expires in 1158 seconds [vlan]
If I leave it alone it'll stay up indefinitely, clients behind can access the Internet and there are no problems. The moment I make a change to the gateway entry and press 'apply' on the page, it drops off the network and the console goes unresponsive.
The one time I still had a responsive console for a couple moments, I could see that a "route change" process was consuming large amounts of CPU:
root 98227 95.2 0.2 8220 1980 - R 15:35 0:23.15 /sbin/route change -inet default 198.51.100.1
Running that command by hand will reproduce the lockup each time. I ran truss on the output and it stops at the same place each time:
: truss -f /sbin/route change -inet default 198.51.100.1 54278: mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366181376 (0x800625000) 54278: issetugid() = 0 (0x0) 54278: lstat("/etc",{ mode=drwxr-xr-x ,inode=1605120,size=4096,blksize=32768 }) = 0 (0x0) 54278: lstat("/etc/libmap.conf",{ mode=-rw-r--r-- ,inode=1606014,size=47,blksize=32768 }) = 0 (0x0) 54278: openat(AT_FDCWD,"/etc/libmap.conf",O_CLOEXEC,00) = 3 (0x3) 54278: fstat(3,{ mode=-rw-r--r-- ,inode=1606014,size=47,blksize=32768 }) = 0 (0x0) 54278: mmap(0x0,47,PROT_READ,MAP_PRIVATE,3,0x0) = 34366214144 (0x80062d000) 54278: close(3) = 0 (0x0) 54278: lstat("/usr",{ mode=drwxr-xr-x ,inode=1364352,size=512,blksize=32768 }) = 0 (0x0) 54278: lstat("/usr/local",{ mode=drwxr-xr-x ,inode=1364356,size=512,blksize=32768 }) = 0 (0x0) 54278: lstat("/usr/local/etc",{ mode=drwxr-xr-x ,inode=1685386,size=1536,blksize=32768 }) = 0 (0x0) 54278: lstat("/usr/local/etc/libmap.d",0x7fffffffcac8) ERR#2 'No such file or directory' 54278: munmap(0x80062d000,47) = 0 (0x0) 54278: openat(AT_FDCWD,"/var/run/ld-elf.so.hints",O_CLOEXEC,00) = 3 (0x3) 54278: read(3,"Ehnt\^A\0\0\0\M^@\0\0\0f\0\0\0\0"...,128) = 128 (0x80) 54278: fstat(3,{ mode=-r--r--r-- ,inode=6,size=230,blksize=32768 }) = 0 (0x0) 54278: lseek(3,0x80,SEEK_SET) = 128 (0x80) 54278: read(3,"/lib:/usr/lib:/usr/lib/compat:/u"...,102) = 102 (0x66) 54278: close(3) = 0 (0x0) 54278: access("/lib/libc.so.7",F_OK) = 0 (0x0) 54278: openat(AT_FDCWD,"/lib/libc.so.7",O_CLOEXEC|O_VERIFY,00) = 3 (0x3) 54278: fstat(3,{ mode=-r--r--r-- ,inode=321144,size=1587696,blksize=32768 }) = 0 (0x0) 54278: mmap(0x0,4096,PROT_READ,MAP_PRIVATE|MAP_PREFAULT_READ,3,0x0) = 34366214144 (0x80062d000) 54278: mmap(0x0,3784704,PROT_NONE,MAP_PRIVATE|MAP_ANON|MAP_NOCORE,-1,0x0) = 34368286720 (0x800827000) 54278: mmap(0x800827000,1540096,PROT_READ|PROT_EXEC,MAP_PRIVATE|MAP_FIXED|MAP_NOCORE|MAP_PREFAULT_READ,3,0x0) = 34368286720 (0x800827000) 54278: mmap(0x800b9f000,45056,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_PREFAULT_READ,3,0x178000) = 34371923968 (0x800b9f000) 54278: mmap(0x800baa000,102400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_FIXED|MAP_ANON,-1,0x0) = 34371969024 (0x800baa000) 54278: munmap(0x80062d000,4096) = 0 (0x0) 54278: close(3) = 0 (0x0) 54278: munmap(0x80062c000,4096) = 0 (0x0) 54278: mmap(0x0,102400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366210048 (0x80062c000) 54278: sysarch(AMD64_SET_FSBASE,0x7fffffffe498) = 0 (0x0) 54278: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 54278: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 54278: readlink("/etc/malloc.conf",0x7fffffffdb90,1024) ERR#2 'No such file or directory' 54278: issetugid() = 0 (0x0) 54278: __sysctl(0x7fffffffda00,0x2,0x7fffffffda50,0x7fffffffda48,0x800973a13,0xd) = 0 (0x0) 54278: __sysctl(0x7fffffffda50,0x2,0x7fffffffdb14,0x7fffffffdb08,0x0,0x0) = 0 (0x0) 54278: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34372071424 (0x800bc3000) 54278: munmap(0x800bc3000,2097152) = 0 (0x0) 54278: mmap(0x0,4190208,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34372071424 (0x800bc3000) 54278: munmap(0x800bc3000,249856) = 0 (0x0) 54278: munmap(0x800e00000,1843200) = 0 (0x0) 54278: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 54278: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 54278: sigprocmask(SIG_BLOCK,{ SIGHUP|SIGINT|SIGQUIT|SIGKILL|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0) 54278: sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0) 54278: getpid() = 54278 (0xd406) 54278: geteuid() = 0 (0x0) 54278: socket(PF_ROUTE,SOCK_RAW,0) = 3 (0x3) 54278: __sysctl(0x7fffffffe100,0x2,0x7fffffffe150,0x7fffffffe148,0x405cc2,0x8) = 0 (0x0) 54278: __sysctl(0x7fffffffe150,0x2,0x606dec,0x7fffffffe250,0x0,0x0) = 0 (0x0) 54278: __sysctl(0x7fffffffe100,0x2,0x7fffffffe150,0x7fffffffe148,0x405ccb,0xd) = 0 (0x0) 54278: __sysctl(0x7fffffffe150,0x2,0x606df0,0x7fffffffe250,0x0,0x0) = 0 (0x0) 54278: shutdown(3,SHUT_RD) = 0 (0x0) 54278: mmap(0x0,2097152,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34374418432 (0x800e00000) 54278: setsockopt(0x3,0xffff,0x1014,0x7fffffffe260,0x4) = 0 (0x0)
The console will only respond to Ctrl-T at that point, which prints a variation of this (CPU usage varies, time increases):
load: 1.31 cmd: route 97129 [runnable] 33.60r 0.00u 33.50s 100% 1964k
Updated by Jim Pingle about 8 years ago
- File redmine-6850-config-pfSense.localdomain-20161207160722.xml redmine-6850-config-pfSense.localdomain-20161207160722.xml added
Full config attached, but it's nothing special - default config + static address on WAN + off-subnet gateway.
Updated by Renato Botelho about 8 years ago
Opened a ticket upstream:
Updated by Renato Botelho about 8 years ago
- Status changed from Confirmed to Feedback
- % Done changed from 0 to 100
Luiz pushed a fix for that deadlock. Next round of 2.4.0 snapshots will have it applied so we can test
https://github.com/pfsense/FreeBSD-src/commit/4627301691bb818abae4e82bda1a5ef38d52a68f
Updated by Jim Pingle about 8 years ago
- Assignee changed from Renato Botelho to Ken Sim
It works well for me now, I can run the route command by hand and also apply settings in the GUI. Assigning back to the original reporter for more feedback.
Updated by Ken Sim about 8 years ago
Everything seems to be working as expected now with that patch applied. I have played around with the gateways for about an hour trying all sorts of different things and it reacted as it should without any sort of delay or lockup. Thanks for all the help getting this issue resolved, I know myself and I am sure others who use the feature are grateful for the help.
Updated by Jim Pingle about 8 years ago
- Status changed from Feedback to Resolved