Bug #7144
closedIntel NIC loosing connection until reboot
0%
Description
Hi
I have a pfsense appliance from applianceshop.eu (https://www.applianceshop.eu/security-appliances/19-rack-appliances/pfsense-based-42/sense-dual-a10-qc-ssd-rack.html). I have a recurring issue where a NIC will loose connectivity until the firewall is reboot, the interval is random and it happens to all three of the NIC's in use. I moved the config to the second unit and the issue persists, which makes a hardware issue unlikely.
The NICS are Intel Pro 1000, searching Google, I came on this very old thread:
https://forum.pfsense.org/index.php?topic=96325.15
The most recent update suggests that the issue persists in 2.3.2 (which I'm running also).
TCP segmentation offloading has been disabled without any change.
System log shows not much interesting:
Jan 19 11:59:14 php-fpm /status_openvpn.php: Successful login for user 'admin' from: 10.100.100.3 Jan 19 12:02:44 sshd 1272 Accepted password for root from 10.255.255.27 port 56932 ssh2 Jan 19 12:05:44 check_reload_status Linkup starting em1 Jan 19 12:05:44 kernel em1: link state changed to DOWN Jan 19 12:05:45 php-fpm 40503 /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.240.254 ) Jan 19 12:05:45 check_reload_status Reloading filter Jan 19 12:05:46 xinetd 11502 Starting reconfiguration Jan 19 12:05:46 xinetd 11502 Swapping defaults Jan 19 12:05:46 xinetd 11502 readjusting service 6969-udp Jan 19 12:05:46 xinetd 11502 Reconfigured: new=0 old=1 dropped=0 (services) Jan 19 12:05:48 check_reload_status Linkup starting em1 Jan 19 12:05:48 kernel em1: link state changed to UP Jan 19 12:05:49 php-fpm /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.240.254 ) Jan 19 12:05:49 check_reload_status rc.newwanip starting em1 Jan 19 12:05:49 check_reload_status Reloading filter Jan 19 12:05:50 php-fpm /rc.newwanip: rc.newwanip: Info: starting on em1. Jan 19 12:05:50 php-fpm /rc.newwanip: rc.newwanip: on (IP address: 192.168.240.254) (interface: LAN[lan]) (real interface: em1). Jan 19 12:05:50 check_reload_status Reloading filter Jan 19 12:05:50 xinetd 11502 Starting reconfiguration Jan 19 12:05:50 xinetd 11502 Swapping defaults Jan 19 12:05:50 xinetd 11502 readjusting service 6969-udp Jan 19 12:05:50 xinetd 11502 Reconfigured: new=0 old=1 dropped=0 (services) Jan 19 12:05:51 xinetd 11502 Starting reconfiguration Jan 19 12:05:51 xinetd 11502 Swapping defaults Jan 19 12:05:51 xinetd 11502 readjusting service 6969-udp Jan 19 12:05:51 xinetd 11502 Reconfigured: new=0 old=1 dropped=0 (services) Jan 19 12:08:23 fw00.internalcaw.local nginx: 2017/01/19 12:08:23 [crit] 20752#100132: *355 SSL_write() failed (SSL:) (13: Permission denied) while sending to client, client: 10.255.255.27, server: , request: "POST /diag_resetstate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.240.254", referrer: "https://192.168.240.254/diag_resetstate.php" Jan 19 12:08:24 sshd 1272 fatal: Fssh_packet_write_poll: Connection from 10.255.255.27 port 56932: Permission denied Jan 19 12:08:45 sshd 33637 fatal: Fssh_packet_write_poll: Connection from 10.255.255.27 port 52735: Permission denied Jan 19 12:08:55 check_reload_status Syncing firewall Jan 19 12:09:14 kernel em1: Watchdog timeout Queue[0]-- resetting Jan 19 12:09:14 kernel Interface is RUNNING and ACTIVE Jan 19 12:09:14 kernel em1: TX Queue 0 ------ Jan 19 12:09:14 kernel em1: hw tdh = 0, hw tdt = 984 Jan 19 12:09:14 kernel em1: Tx Queue Status = -2147483648 Jan 19 12:09:14 kernel em1: TX descriptors avail = 40 Jan 19 12:09:14 kernel em1: Tx Descriptors avail failure = 4142 Jan 19 12:09:14 kernel em1: RX Queue 0 ------ Jan 19 12:09:14 kernel em1: hw rdh = 0, hw rdt = 1023 Jan 19 12:09:14 kernel em1: RX discarded packets = 0 Jan 19 12:09:14 kernel em1: RX Next to Check = 0 Jan 19 12:09:14 kernel em1: RX Next to Refresh = 0 Jan 19 12:09:14 kernel em1: link state changed to DOWN Jan 19 12:09:14 check_reload_status Linkup starting em1
Updated by Jim Pingle over 8 years ago
- Status changed from New to Rejected
On the contrary, this is almost certainly a hardware issue, or at least a driver issue:
"Jan 19 12:09:14 kernel em1: Watchdog timeout Queue0-- resetting"
Try a 2.4 snapshot, if it still breaks, try to replicate it on stock FreeBSD and report it upstream to FreeBSD.
Updated by Guy Van Sanden over 8 years ago
Jim Pingle wrote:
On the contrary, this is almost certainly a hardware issue, or at least a driver issue:
"Jan 19 12:09:14 kernel em1: Watchdog timeout Queue0-- resetting"Try a 2.4 snapshot, if it still breaks, try to replicate it on stock FreeBSD and report it upstream to FreeBSD.
Sorry if I wasn't clear, it may be an issue with all Intel cards of this type, as the forum thread would suggest. What I wanted to rule out was that it's not a hardware defect in one particular unit.
Driver issue seems very likely, as the thread also suggests, issue was not there in PfSense 2.1 or earlier.
Updated by Jim Pingle over 8 years ago
Either way -- driver or hardware (chip or that unit) -- it's not something we can address. It needs replicated in FreeBSD and taken upstream.