Bug #8145
closedRecurring deadlock during normal operation.
0%
Description
At seemingly random intervals during normal operation, intervals as long as several hours and as short as several minutes, pfsense experiences what appears to be a deadlock, though could be the crash of a component while it holds a network related lock.
When the apparent deadlock occurs:
acpi reboot signals do shut down many processes, but not all and the reboot does not happen.
the serial console functions until a network related command is given in a shell, then it stops accepting characters. For example,
[2.4.2-RELEASE.../root]: netstat -4nl
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 0 0 10.1.0.2.56695 10.1.0.3.520 SYN_SENT
tcp4 0 0 10.12.159.253.20853 10.12.159.252.520 SYN_SENT
tcp4 0 24 10.12.159.253.19066 10.12.159.252.520 FIN_WAIT_1
tcp4 0 36 10.1.0.2.5698 10.1.0.3.520 FIN_WAIT_1
udp4 0 0 10.12.159.254.41293 .*
udp4 0 0 173.29.66.105.41155 *.
udp4 0 0 10.1.0.1.123 .*
udp4 0 0 10.1.0.2.123 *.
udp4 0 0 127.0.0.1.123 .*
udp4 0 0 172.16.201.94.123 *.
udp4 0 0 172.29.22.254.123 .*
udp4 0 0 172.29.21.254.123 *.
udp4 0 0 10.12.159.254.123 .*
udp4 0 0 192.168.24.1.123 *.
udp4 0 0 192.168.23.254.123 .*
udp4 0 0 192.168.22.254.123 *.
udp4 0 0 192.168.23.252.123 .*
udp4 0 0 192.168.22.252.123 *.
udp4 0 0 192.168.29.1.123 .*
udp4 0 0 10.12.159.253.123 *.
udp4 0 0 127.0.0.1.6969 .*
udp4 16640 0 173.29.66.105.24048 *.
<no further characters appear, the serial console is locked at this point. ^C has no effect. >
The acpi shutdown signal results in no further serial console activity, on the vga screen nothing appears. However, in the vm running pfsense there is occasional periodic processor activity, maybe a 15% bump lasting 1 sec every 20 sec or so.
the vga console, running the pfsense menu otherwise idle, does not respond to keystrokes at all the moment the deadlock occurs.
ssh clients, previously connected to pfsense and otherwise idle at a command prompt time out and break the connection at the client side.
It is my best guess the deadlock arises from a process that holds a lock (like carp or openvpn client) which upon filter reload or reconfig then calls ifconfig, a process that hangs forever awaiting the lock that will never be available as it is held by the parent process. When enough of the system was yet working under conditions of deadlock while the serial console yet operated, I see 'grep' and 'wc' under ifconfig's process with L status. My bet is that ifconfig or something it fires off is polling ucarp for primary/backup status info, but the whole thing is fired off by an openvpn client based on an outgoing ucarp interface-- which holds a ucarp related lock needed by the reconfig process.
I'll try to tease out a ps output when next it locks up.
Updated by Harry Coin about 7 years ago
So here's the output of ps that *still works* in the serial console while the rest of the system is locked as noted above. Notice the relationship between processes waiting and holding locks. /root: ps -axHdwwo "pid ppid %cpu tt stat systime blocked state mwchan command" PID PPID %CPU TT STAT SYSTIME BLOCKED STAT MWCHAN COMMAND 0 0 0.0 - DLs 8588:17.70 0 DLs swapin [kernel/swapper] 1 0 0.0 - ILs 0:00.03 9e7c9014 ILs wait - /sbin/init -- 307 1 0.0 - Ss 0:01.26 0 Ss kqread |-- php-fpm: master process (/usr/local/lib/php-fpm.conf) (php-fpm) 72116 307 0.0 - I 0:00.01 0 I accept | `-- php-fpm: pool nginx (php-fpm) 321 1 0.0 - INs 0:00.05 7ff6befe INs kqread |-- /usr/local/sbin/check_reload_status 323 321 0.0 - IN 0:00.00 7ff6befe IN kqread | `-- check_reload_status: Monitoring daemon of check_reload_status 336 1 0.0 - Is 0:00.06 0 Is select |-- /sbin/devd -q -f /etc/pfSense-devd.conf 7031 1 0.0 - Is 0:00.00 0 Is uwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7031 1 0.0 - Is 0:00.14 0 Is sbwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7031 1 0.0 - Ls 0:00.33 0 Ls if_bridg |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7031 1 0.0 - Ss 0:00.09 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7031 1 0.0 - Is 0:00.03 0 Is accept |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7325 1 0.0 - Is 0:00.00 0 Is uwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7325 1 0.0 - Is 0:00.48 0 Is sbwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7325 1 0.0 - Ls 0:01.43 0 Ls if_bridg |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7325 1 0.0 - Ss 0:00.61 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7325 1 0.0 - Is 0:00.01 0 Is accept |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7653 1 0.0 - Is 0:00.00 0 Is uwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7653 1 0.0 - Ss 0:00.92 0 Ss sbwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7653 1 0.0 - Ss 0:02.27 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7653 1 0.0 - Ss 0:00.62 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 7653 1 0.0 - Is 0:00.02 0 Is accept |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 8067 1 0.0 - Is 0:00.00 0 Is uwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 8067 1 0.0 - Ss 0:00.91 0 Ss sbwait |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 8067 1 0.0 - Ss 0:02.51 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 8067 1 0.0 - Ss 0:00.62 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 8067 1 0.0 - Is 0:00.03 0 Is accept |-- /usr/local/bin/dpinger -S -r 0 -i <far side of vpn> 8326 1 0.0 - Is 0:00.00 0 Is uwait |-- /usr/local/bin/dpinger -S -r 0 -i <isp> 8326 1 0.0 - Ss 0:01.00 0 Ss sbwait |-- /usr/local/bin/dpinger -S -r 0 -i <isp> 8326 1 0.0 - Ss 0:02.29 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <isp> 8326 1 0.0 - Ss 0:00.67 0 Ss nanslp |-- /usr/local/bin/dpinger -S -r 0 -i <isp> 8326 1 0.0 - Is 0:00.02 0 Is accept |-- /usr/local/bin/dpinger -S -r 0 -i <isp> 9373 1 0.0 - Is 0:00.06 0 Is select |-- dhclient: vtnet1 [priv] (dhclient) 13403 1 0.0 - Ss 0:00.12 0 Ss select |-- dhclient: vtnet1 (dhclient) 14509 1 0.0 - Is 0:00.00 0 Is select |-- /usr/sbin/sshd 28061 1 0.0 - Ss 0:00.51 0 Ss select |-- /usr/local/sbin/openvpn --config /var/etc/openvpn/client2.conf 30047 1 0.0 - Ss 0:19.31 0 Ss select |-- /usr/local/sbin/openvpn --config /var/etc/openvpn/client3.conf 42540 1 0.0 - Ss 0:06.95 0 Ss select |-- /usr/local/sbin/dhcpd -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpd.conf -pf /var/run/dhcpd.pid vtnet0 vtnet0.30 43814 1 0.0 - Ss 0:04.97 0 Ss select |-- /usr/local/sbin/dhcpd -6 -user dhcpd -group _dhcp -chroot /var/dhcpd -cf /etc/dhcpdv6.conf -pf /var/run/dhcpdv6.pid vtnet0 44137 1 0.0 - Is 0:00.00 0 Is kqread |-- /usr/local/sbin/dhcpleases6 -c /usr/local/bin/php-cgi -f /usr/local/sbin/prefixes.php|/bin/sh -l /var/dhcpd/var/db/dhcpd6.leases 46990 1 0.0 - IN 0:16.69 0 IN wait |-- /bin/sh /var/db/rrd/updaterrd.sh 61474 46990 0.0 - IN 0:00.00 0 IN nanslp | `-- sleep 60 55338 1 0.0 - Ss 0:00.82 0 Ss select |-- /usr/local/sbin/radvd -p /var/run/radvd.pid -C /var/etc/radvd.conf -m syslog 59394 1 0.0 - Ss 0:00.69 0 Ss select |-- /usr/local/sbin/openvpn --config /var/etc/openvpn/client4.conf 69070 1 0.0 - Ls 0:07.47 0 Ls carp_sof |-- /usr/local/sbin/openvpn --config /var/etc/openvpn/client1.conf 79855 1 0.0 - Is 0:00.35 0 Is select |-- /usr/sbin/syslogd -s -c -c -l /var/dhcpd/var/run/log -P /var/run/syslog.pid -f /etc/syslog.conf 33891 79855 0.0 - Is 0:00.01 2001 Is piperd | |-- /usr/local/sbin/sshlockout_pf 15 33891 79855 0.0 - Is 0:00.00 2001 Is nanslp | `-- /usr/local/sbin/sshlockout_pf 15 84053 1 0.0 - Ss 0:00.82 0 Ss bpf |-- /usr/local/sbin/filterlog -i pflog0 -p /var/run/filterlog.pid 84114 1 0.0 - Is 0:00.00 0 Is wait |-- /usr/local/bin/minicron 240 /var/run/ping_hosts.pid /usr/local/bin/ping_hosts.sh 84433 84114 0.0 - I 0:00.01 80006 I wait | `-- minicron: helper /usr/local/bin/ping_hosts.sh (minicron) 56555 84433 0.0 - I 0:00.00 0 I piperd | `-- /bin/sh /usr/local/bin/ping_hosts.sh 56661 56555 0.0 - I 0:00.00 0 I wait | `-- /bin/sh /usr/local/bin/ping_hosts.sh 56696 56661 0.0 - L 0:00.00 0 L carp_if | |-- ifconfig 56959 56661 0.0 - I 0:00.01 0 I piperd | |-- grep carp: BACKUP vhid 57227 56661 0.0 - I 0:00.01 0 I piperd | `-- wc -l 84776 1 0.0 - Is 0:00.00 0 Is select |-- /usr/local/sbin/xinetd -syslog daemon -f /var/etc/xinetd.conf -pidfile /var/run/xinetd.pid 84821 1 0.0 - Is 0:00.00 0 Is wait |-- /usr/local/bin/minicron 3600 /var/run/expire_accounts.pid /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts 85313 84821 0.0 - I 0:00.00 0 I nanslp | `-- minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.expireaccounts (minicron) 85571 1 0.0 - Is 0:00.00 0 Is wait |-- /usr/local/bin/minicron 86400 /var/run/update_alias_url_data.pid /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data 86081 85571 0.0 - I 0:00.00 0 I nanslp | `-- minicron: helper /usr/local/sbin/fcgicli -f /etc/rc.update_alias_url_data (minicron) 95019 1 0.0 - S 0:00.33 0 S select |-- /usr/local/sbin/dnsmasq --all-servers -C /dev/null --rebind-localhost-ok --stop-dns-rebind --dns-forward-max=5000 --cache-size=10000 --local-ttl=1 97966 1 0.0 - Is 0:00.01 0 Is pause |-- nginx: master process /usr/local/sbin/nginx -c /var/etc/nginx-webConfigurator.conf (nginx) 98108 97966 0.0 - I 0:00.03 0 I kqread | |-- nginx: worker process (nginx) 98295 97966 0.0 - I 0:00.06 0 I kqread | `-- nginx: worker process (nginx) 98523 1 0.0 - Is 0:00.11 0 Is nanslp |-- /usr/sbin/cron -s 99086 1 0.0 - Ss 0:01.69 0 Ss select |-- /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid 32113 1 0.0 u0 Is 0:00.02 0 Is wait |-- login [pam] (login) 34228 32113 0.0 u0 I 0:00.00 0 I wait | `-- -sh (sh) 34720 34228 0.0 u0 I 0:00.02 0 I wait | `-- /bin/sh /etc/rc.initial 85320 34720 0.0 u0 S 0:00.46 2 S pause | `-- /bin/tcsh 61714 85320 0.0 u0 R+ 0:00.01 0 R+ - | `-- ps -axHdwwo pid ppid %cpu tt stat systime blocked state mwchan command 30815 1 0.0 v0 Is 0:00.01 0 Is wait |-- login [pam] (login) 34064 30815 0.0 v0 I 0:00.01 0 I wait | `-- -sh (sh) 34985 34064 0.0 v0 I+ 0:00.01 0 I+ ttyin | `-- /bin/sh /etc/rc.initial 30878 1 0.0 v1 Is+ 0:00.01 0 Is+ ttyin |-- /usr/libexec/getty Pc ttyv1 31007 1 0.0 v2 Is+ 0:00.01 0 Is+ ttyin |-- /usr/libexec/getty Pc ttyv2 31330 1 0.0 v3 Is+ 0:00.00 0 Is+ ttyin |-- /usr/libexec/getty Pc ttyv3 31518 1 0.0 v4 Is+ 0:00.01 0 Is+ ttyin |-- /usr/libexec/getty Pc ttyv4 31772 1 0.0 v5 Is+ 0:00.01 0 Is+ ttyin |-- /usr/libexec/getty Pc ttyv5 31831 1 0.0 v6 Is+ 0:00.01 0 Is+ ttyin |-- /usr/libexec/getty Pc ttyv6 31993 1 0.0 v7 Is+ 0:00.00 0 Is+ ttyin `-- /usr/libexec/getty Pc ttyv7 2 0 0.0 - DL 0:00.00 0 DL crypto_w - [crypto] 3 0 0.0 - DL 0:00.00 0 DL crypto_r - [crypto returns] 4 0 0.0 - DL 0:00.00 0 DL - - [cam/doneq0] 4 0 0.0 - DL 0:00.22 0 DL - - [cam/scanner] 5 0 0.0 - DL 0:00.01 0 DL - - [soaiod1] 6 0 0.0 - DL 0:00.01 0 DL - - [soaiod2] 7 0 0.0 - DL 0:00.01 0 DL - - [soaiod3] 8 0 0.0 - DL 0:00.01 0 DL - - [soaiod4] 9 0 0.0 - DL 0:00.00 0 DL waiting_ - [sctp_iterator] 10 0 0.0 - DL 0:00.00 0 DL audit_wo - [audit] 11 0 100.0 - RL 227:37.09 0 RL - - [idle/idle: cpu0] 11 0 100.0 - RL 227:21.51 0 RL - - [idle/idle: cpu1] 12 0 0.0 - WL 0:00.02 0 WL - - [intr/swi1: netisr 0] 12 0 0.0 - WL 0:00.01 0 WL - - [intr/swi1: netisr 1] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi3: vm] 12 0 0.0 - LL 0:00.00 0 LL if_bridg - [intr/swi4: clock (0] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi4: clock (1] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi6: task que] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi6: Giant ta] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi5: fast tas] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq14: ata0] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq15: ata1] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq256: virtio] 12 0 0.0 - LL 0:00.00 0 LL if_bridg - [intr/irq257: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq258: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq259: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq260: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq261: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq262: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq263: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq264: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq11: uhci0 u] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq10: uhci2 e] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq265: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq266: virtio] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq1: atkbd0] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/irq12: psm0] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi0: uart] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi1: pf send] 12 0 0.0 - WL 0:00.00 0 WL - - [intr/swi1: pfsync] 13 0 0.0 - DL 0:00.00 0 DL sleep - [ng_queue/ng_queue0] 13 0 0.0 - DL 0:00.00 0 DL sleep - [ng_queue/ng_queue1] 14 0 0.0 - DL 0:00.01 0 DL - - [geom/g_event] 14 0 0.0 - DL 0:00.00 0 DL - - [geom/g_up] 14 0 0.0 - DL 0:00.01 0 DL - - [geom/g_down] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus0] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus0] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus0] 15 0 0.0 - DL 0:00.21 0 DL - - [usb/usbus0] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus0] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus1] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus1] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus1] 15 0 0.0 - DL 0:00.21 0 DL - - [usb/usbus1] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus1] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus2] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus2] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus2] 15 0 0.0 - DL 0:00.20 0 DL - - [usb/usbus2] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus2] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus3] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus3] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus3] 15 0 0.0 - DL 0:00.39 0 DL - - [usb/usbus3] 15 0 0.0 - DL 0:00.00 0 DL - - [usb/usbus3] 16 0 0.0 - DL 0:06.75 0 DL pftm - [pf purge] 17 0 0.0 - DL 0:02.72 0 DL - - [rand_harvestq] 18 0 0.0 - DL 0:00.67 0 DL psleep - [pagedaemon/pagedaem] 18 0 0.0 - DL 0:00.00 0 DL launds - [pagedaemon/laundry:] 18 0 0.0 - DL 0:00.00 0 DL umarcl - [pagedaemon/uma] 19 0 0.0 - DL 0:00.00 0 DL psleep - [vmdaemon] 20 0 0.0 - DL 0:00.00 0 DL pgzero - [pagezero] 21 0 0.0 - DL 0:00.31 0 DL - - [bufspacedaemon] 22 0 0.0 - DL 0:00.30 0 DL psleep - [bufdaemon/bufdaemon] 22 0 0.0 - DL 0:00.82 0 DL sdflush - [bufdaemon// worker] 23 0 0.0 - DL 0:00.31 0 DL vlruwt - [vnlru] 24 0 0.0 - DL 0:03.75 0 DL syncer - [syncer] 57 0 0.0 - DL 0:00.08 0 DL mdwait - [md0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/if_config_tq] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/if_io_tqg_0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/if_io_tqg_1] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/kqueue_ctx t] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/aiod_kick ta] 0 0 0.0 - DLs 0:01.02 0 DLs - [kernel/thread taskq] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/firmware tas] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/vtnet0 rxq 0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/vtnet0 txq 0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/vtnet1 rxq 0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/vtnet1 txq 0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/vtnet2 rxq 0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/vtnet2 txq 0] 0 0 0.0 - DLs 0:00.00 0 DLs vtbslp [kernel/virtio_ballo] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/mca taskq] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/acpi_task_0] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/acpi_task_1] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/acpi_task_2] 0 0 0.0 - DLs 0:00.00 0 DLs - [kernel/CAM taskq]
Updated by Harry Coin about 7 years ago
Note also the web interface is not responsive during the deadlock. Basically, any process that doesn't touch the networking lock world still can run.
Updated by Harry Coin about 7 years ago
Harry Coin wrote:
Note also the web interface is not responsive during the deadlock. Basically, any process that doesn't touch the networking lock world still can run.
Looks like there is a known freebsd bug: carp vs. bridged interface.
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200319