Project

General

Profile

Actions

Bug #7144

closed

Intel NIC loosing connection until reboot

Added by Guy Van Sanden over 8 years ago. Updated over 8 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
Operating System
Target version:
-
Start date:
01/19/2017
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.3.2
Affected Architecture:

Description

Hi

I have a pfsense appliance from applianceshop.eu (https://www.applianceshop.eu/security-appliances/19-rack-appliances/pfsense-based-42/sense-dual-a10-qc-ssd-rack.html). I have a recurring issue where a NIC will loose connectivity until the firewall is reboot, the interval is random and it happens to all three of the NIC's in use. I moved the config to the second unit and the issue persists, which makes a hardware issue unlikely.

The NICS are Intel Pro 1000, searching Google, I came on this very old thread:
https://forum.pfsense.org/index.php?topic=96325.15

The most recent update suggests that the issue persists in 2.3.2 (which I'm running also).

TCP segmentation offloading has been disabled without any change.

System log shows not much interesting:

Jan 19 11:59:14     php-fpm         /status_openvpn.php: Successful login for user 'admin' from: 10.100.100.3
Jan 19 12:02:44     sshd     1272     Accepted password for root from 10.255.255.27 port 56932 ssh2
Jan 19 12:05:44     check_reload_status         Linkup starting em1
Jan 19 12:05:44     kernel         em1: link state changed to DOWN
Jan 19 12:05:45     php-fpm     40503     /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.240.254 )
Jan 19 12:05:45     check_reload_status         Reloading filter
Jan 19 12:05:46     xinetd     11502     Starting reconfiguration
Jan 19 12:05:46     xinetd     11502     Swapping defaults
Jan 19 12:05:46     xinetd     11502     readjusting service 6969-udp
Jan 19 12:05:46     xinetd     11502     Reconfigured: new=0 old=1 dropped=0 (services)
Jan 19 12:05:48     check_reload_status         Linkup starting em1
Jan 19 12:05:48     kernel         em1: link state changed to UP
Jan 19 12:05:49     php-fpm         /rc.linkup: Hotplug event detected for LAN(lan) static IP (192.168.240.254 )
Jan 19 12:05:49     check_reload_status         rc.newwanip starting em1
Jan 19 12:05:49     check_reload_status         Reloading filter
Jan 19 12:05:50     php-fpm         /rc.newwanip: rc.newwanip: Info: starting on em1.
Jan 19 12:05:50     php-fpm         /rc.newwanip: rc.newwanip: on (IP address: 192.168.240.254) (interface: LAN[lan]) (real interface: em1).
Jan 19 12:05:50     check_reload_status         Reloading filter
Jan 19 12:05:50     xinetd     11502     Starting reconfiguration
Jan 19 12:05:50     xinetd     11502     Swapping defaults
Jan 19 12:05:50     xinetd     11502     readjusting service 6969-udp
Jan 19 12:05:50     xinetd     11502     Reconfigured: new=0 old=1 dropped=0 (services)
Jan 19 12:05:51     xinetd     11502     Starting reconfiguration
Jan 19 12:05:51     xinetd     11502     Swapping defaults
Jan 19 12:05:51     xinetd     11502     readjusting service 6969-udp
Jan 19 12:05:51     xinetd     11502     Reconfigured: new=0 old=1 dropped=0 (services)
Jan 19 12:08:23     fw00.internalcaw.local         nginx: 2017/01/19 12:08:23 [crit] 20752#100132: *355 SSL_write() failed (SSL:) (13: Permission denied) while sending to client, client: 10.255.255.27, server: , request: "POST /diag_resetstate.php HTTP/1.1", upstream: "fastcgi://unix:/var/run/php-fpm.socket", host: "192.168.240.254", referrer: "https://192.168.240.254/diag_resetstate.php" 
Jan 19 12:08:24     sshd     1272     fatal: Fssh_packet_write_poll: Connection from 10.255.255.27 port 56932: Permission denied
Jan 19 12:08:45     sshd     33637     fatal: Fssh_packet_write_poll: Connection from 10.255.255.27 port 52735: Permission denied
Jan 19 12:08:55     check_reload_status         Syncing firewall
Jan 19 12:09:14     kernel         em1: Watchdog timeout Queue[0]-- resetting
Jan 19 12:09:14     kernel         Interface is RUNNING and ACTIVE
Jan 19 12:09:14     kernel         em1: TX Queue 0 ------
Jan 19 12:09:14     kernel         em1: hw tdh = 0, hw tdt = 984
Jan 19 12:09:14     kernel         em1: Tx Queue Status = -2147483648
Jan 19 12:09:14     kernel         em1: TX descriptors avail = 40
Jan 19 12:09:14     kernel         em1: Tx Descriptors avail failure = 4142
Jan 19 12:09:14     kernel         em1: RX Queue 0 ------
Jan 19 12:09:14     kernel         em1: hw rdh = 0, hw rdt = 1023
Jan 19 12:09:14     kernel         em1: RX discarded packets = 0
Jan 19 12:09:14     kernel         em1: RX Next to Check = 0
Jan 19 12:09:14     kernel         em1: RX Next to Refresh = 0
Jan 19 12:09:14     kernel         em1: link state changed to DOWN
Jan 19 12:09:14     check_reload_status         Linkup starting em1
Actions #1

Updated by Jim Pingle over 8 years ago

  • Status changed from New to Rejected

On the contrary, this is almost certainly a hardware issue, or at least a driver issue:
"Jan 19 12:09:14 kernel em1: Watchdog timeout Queue0-- resetting"

Try a 2.4 snapshot, if it still breaks, try to replicate it on stock FreeBSD and report it upstream to FreeBSD.

Actions #2

Updated by Guy Van Sanden over 8 years ago

Jim Pingle wrote:

On the contrary, this is almost certainly a hardware issue, or at least a driver issue:
"Jan 19 12:09:14 kernel em1: Watchdog timeout Queue0-- resetting"

Try a 2.4 snapshot, if it still breaks, try to replicate it on stock FreeBSD and report it upstream to FreeBSD.

Sorry if I wasn't clear, it may be an issue with all Intel cards of this type, as the forum thread would suggest. What I wanted to rule out was that it's not a hardware defect in one particular unit.

Driver issue seems very likely, as the thread also suggests, issue was not there in PfSense 2.1 or earlier.

Actions #3

Updated by Jim Pingle over 8 years ago

Either way -- driver or hardware (chip or that unit) -- it's not something we can address. It needs replicated in FreeBSD and taken upstream.

Actions

Also available in: Atom PDF