Bug #8145
closedRecurring deadlock during normal operation.
0%
Description
At seemingly random intervals during normal operation, intervals as long as several hours and as short as several minutes, pfsense experiences what appears to be a deadlock, though could be the crash of a component while it holds a network related lock.
When the apparent deadlock occurs:
acpi reboot signals do shut down many processes, but not all and the reboot does not happen.
the serial console functions until a network related command is given in a shell, then it stops accepting characters. For example,
[2.4.2-RELEASE.../root]: netstat -4nl
Active Internet connections
Proto Recv-Q Send-Q Local Address Foreign Address (state)
tcp4 0 0 10.1.0.2.56695 10.1.0.3.520 SYN_SENT
tcp4 0 0 10.12.159.253.20853 10.12.159.252.520 SYN_SENT
tcp4 0 24 10.12.159.253.19066 10.12.159.252.520 FIN_WAIT_1
tcp4 0 36 10.1.0.2.5698 10.1.0.3.520 FIN_WAIT_1
udp4 0 0 10.12.159.254.41293 .*
udp4 0 0 173.29.66.105.41155 *.
udp4 0 0 10.1.0.1.123 .*
udp4 0 0 10.1.0.2.123 *.
udp4 0 0 127.0.0.1.123 .*
udp4 0 0 172.16.201.94.123 *.
udp4 0 0 172.29.22.254.123 .*
udp4 0 0 172.29.21.254.123 *.
udp4 0 0 10.12.159.254.123 .*
udp4 0 0 192.168.24.1.123 *.
udp4 0 0 192.168.23.254.123 .*
udp4 0 0 192.168.22.254.123 *.
udp4 0 0 192.168.23.252.123 .*
udp4 0 0 192.168.22.252.123 *.
udp4 0 0 192.168.29.1.123 .*
udp4 0 0 10.12.159.253.123 *.
udp4 0 0 127.0.0.1.6969 .*
udp4 16640 0 173.29.66.105.24048 *.
<no further characters appear, the serial console is locked at this point. ^C has no effect. >
The acpi shutdown signal results in no further serial console activity, on the vga screen nothing appears. However, in the vm running pfsense there is occasional periodic processor activity, maybe a 15% bump lasting 1 sec every 20 sec or so.
the vga console, running the pfsense menu otherwise idle, does not respond to keystrokes at all the moment the deadlock occurs.
ssh clients, previously connected to pfsense and otherwise idle at a command prompt time out and break the connection at the client side.
It is my best guess the deadlock arises from a process that holds a lock (like carp or openvpn client) which upon filter reload or reconfig then calls ifconfig, a process that hangs forever awaiting the lock that will never be available as it is held by the parent process. When enough of the system was yet working under conditions of deadlock while the serial console yet operated, I see 'grep' and 'wc' under ifconfig's process with L status. My bet is that ifconfig or something it fires off is polling ucarp for primary/backup status info, but the whole thing is fired off by an openvpn client based on an outgoing ucarp interface-- which holds a ucarp related lock needed by the reconfig process.
I'll try to tease out a ps output when next it locks up.