Bug #8506
closedConstant link cycling on some DHCP interfaces causes connectivity problems and other issues
0%
Description
Since the switch to 11-stable on 2.4.4 snapshots, it appears that in some cases a DHCP WAN interface will constantly cycle link up/down for no apparent reason. The link cycling leads to other services restarting so often as to render the system mostly unusable.
I have only seen it happen on physical igb interfaces so far. I can't replicate it in a VM or on ARM (SG-1000 and SG-3100 are OK)
It happens with and without a spoofed MAC address on the interface.
<opt1> <descr><![CDATA[Cable]]></descr> <if>igb2</if> <enable></enable> <alias-address></alias-address> <alias-subnet>32</alias-subnet> <spoofmac>00:xx:xx:xx:xx:xx</spoofmac> <monitorip>x.x.x.1</monitorip> <ipaddr>dhcp</ipaddr> <dhcphostname></dhcphostname> <dhcprejectfrom>192.168.100.1</dhcprejectfrom> <adv_dhcp_pt_timeout></adv_dhcp_pt_timeout> <adv_dhcp_pt_retry></adv_dhcp_pt_retry> <adv_dhcp_pt_select_timeout></adv_dhcp_pt_select_timeout> <adv_dhcp_pt_reboot></adv_dhcp_pt_reboot> <adv_dhcp_pt_backoff_cutoff></adv_dhcp_pt_backoff_cutoff> <adv_dhcp_pt_initial_interval></adv_dhcp_pt_initial_interval> <adv_dhcp_pt_values>SavedCfg</adv_dhcp_pt_values> <adv_dhcp_send_options></adv_dhcp_send_options> <adv_dhcp_request_options></adv_dhcp_request_options> <adv_dhcp_required_options></adv_dhcp_required_options> <adv_dhcp_option_modifiers></adv_dhcp_option_modifiers> <adv_dhcp_config_advanced></adv_dhcp_config_advanced> <adv_dhcp_config_file_override></adv_dhcp_config_file_override> <adv_dhcp_config_file_override_path></adv_dhcp_config_file_override_path> </opt1>
May 10 20:00:02 blooper kernel: igb2: link state changed to DOWN May 10 20:00:03 blooper check_reload_status: rc.newwanip starting igb2 May 10 20:00:04 blooper php-fpm[36293]: /rc.linkup: dpinger: timeout while retrieving status for gateway HeNet_GW May 10 20:00:04 blooper php-fpm[36293]: /rc.linkup: Gateways status could not be determined, considering all as up/active. (Group: PreferCableV6) May 10 20:00:04 blooper php-fpm[36293]: /rc.linkup: Gateways status could not be determined, considering all as up/active. (Group: PreferDSLV6) May 10 20:00:04 blooper php-fpm[36293]: /rc.linkup: Gateways status could not be determined, considering all as up/active. (Group: LoadBalanceV6) May 10 20:00:04 blooper kernel: igb2: link state changed to UP May 10 20:00:04 blooper check_reload_status: Linkup starting igb2 May 10 20:00:04 blooper php-fpm[41445]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:05 blooper php-fpm[45686]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:07 blooper php-fpm[36293]: /rc.linkup: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:07 blooper php-fpm[41445]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_GW May 10 20:00:07 blooper php-fpm[41445]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferCableV6) May 10 20:00:07 blooper php-fpm[41445]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferDSLV6) May 10 20:00:07 blooper php-fpm[41445]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: LoadBalanceV6) May 10 20:00:07 blooper php-fpm[41445]: /rc.newwanip: phpDynDNS (vpnpeer.pingle.org): No change in my IP address and/or 25 days has not passed. Not updating dynamic DNS entry. May 10 20:00:08 blooper php-fpm[45686]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_GW May 10 20:00:08 blooper php-fpm[45686]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferCableV6) May 10 20:00:08 blooper php-fpm[45686]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferDSLV6) May 10 20:00:08 blooper php-fpm[45686]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: LoadBalanceV6) May 10 20:00:10 blooper check_reload_status: Linkup starting igb2 May 10 20:00:10 blooper kernel: igb2: link state changed to DOWN May 10 20:00:10 blooper check_reload_status: rc.newwanip starting igb2 May 10 20:00:10 blooper php-fpm[36293]: /rc.linkup: dpinger: timeout while retrieving status for gateway HeNet_GW May 10 20:00:10 blooper php-fpm[45686]: /rc.newwanip: The command '/usr/local/sbin/unbound -c /var/unbound/unbound.conf' returned exit code '1', the output was '[1525996810] unbound[15130:0] error: bind: address already in use [1525996810] unbound[15130:0] fatal error: could not open ports' May 10 20:00:10 blooper dhcpleases: kqueue error: unkown May 10 20:00:10 blooper php-fpm[36293]: /rc.linkup: Gateways status could not be determined, considering all as up/active. (Group: PreferCableV6) May 10 20:00:10 blooper php-fpm[36293]: /rc.linkup: Gateways status could not be determined, considering all as up/active. (Group: PreferDSLV6) May 10 20:00:10 blooper php-fpm[36293]: /rc.linkup: Gateways status could not be determined, considering all as up/active. (Group: LoadBalanceV6) May 10 20:00:10 blooper check_reload_status: updating dyndns opt1 May 10 20:00:10 blooper check_reload_status: Reloading filter May 10 20:00:10 blooper php-fpm[85682]: /rc.linkup: DEVD Ethernet detached event for opt1 May 10 20:00:11 blooper php-fpm[326]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:11 blooper check_reload_status: Reloading filter May 10 20:00:11 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper php-fpm[30692]: /rc.linkup: DEVD Ethernet attached event for opt1 May 10 20:00:12 blooper php-fpm[30692]: /rc.linkup: HOTPLUG: Configuring interface opt1 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper php-fpm[41445]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:12 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:13 blooper check_reload_status: Linkup starting igb2 May 10 20:00:13 blooper kernel: igb2: link state changed to UP May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper php-fpm[326]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_GW May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper php-fpm[326]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferCableV6) May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper kernel: arpresolve: can't allocate llinfo for X.X.X.1 on igb2 May 10 20:00:14 blooper php-fpm[326]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferDSLV6) May 10 20:00:14 blooper php-fpm[326]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: LoadBalanceV6) May 10 20:00:14 blooper check_reload_status: Linkup starting igb2 May 10 20:00:14 blooper kernel: igb2: link state changed to DOWN May 10 20:00:14 blooper kernel: nd6_setmtu0: new link MTU on igb2 (576) is too small for IPv6 May 10 20:00:14 blooper check_reload_status: rc.newwanip starting igb2 May 10 20:00:14 blooper snmpd[15821]: SIOCGIFDESCR (gif0): Device not configured May 10 20:00:14 blooper kernel: gif0: link state changed to DOWN May 10 20:00:14 blooper snmpd[15821]: SIOCGIFDESCR (gif0): Device not configured May 10 20:00:14 blooper kernel: gif0: link state changed to UP May 10 20:00:15 blooper php-fpm[41445]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_GW May 10 20:00:16 blooper php-fpm[41445]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferCableV6) May 10 20:00:16 blooper php-fpm[41445]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: PreferDSLV6) May 10 20:00:16 blooper php-fpm[41445]: /rc.newwanip: Gateways status could not be determined, considering all as up/active. (Group: LoadBalanceV6) May 10 20:00:16 blooper kernel: igb2: link state changed to UP May 10 20:00:16 blooper check_reload_status: Linkup starting igb2 May 10 20:00:16 blooper php-fpm[45686]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:16 blooper check_reload_status: Linkup starting igb2 May 10 20:00:16 blooper kernel: igb2: link state changed to DOWN May 10 20:00:17 blooper check_reload_status: rc.newwanip starting igb2 May 10 20:00:18 blooper php-fpm[326]: /rc.newwanip: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:18 blooper php-fpm[30692]: /rc.linkup: dpinger: timeout while retrieving status for gateway HeNet_DSLGW May 10 20:00:18 blooper kernel: igb2: link state changed to UP May 10 20:00:18 blooper check_reload_status: Linkup starting igb2
Updated by Jim Pingle over 6 years ago
So far I haven't found any relevant common elements between the two systems that can replicate the problem.
That NIC shows a large number of link interrupts as well:
interrupt total rate irq262: igb2:que 0 41266 35 irq263: igb2:que 1 9468 8 irq264: igb2:link 449 0
That matches up with the number of link transitions logged:
$ grep -c 'igb2: link state changed' OS-Message\ Buffer.txt 447
Other NICs on that box had <=3 events
There are other systems with igb NICs running DHCP WANs that do not appear to hit this issue so it's still unclear what is triggering the problem.
Updated by Jim Pingle over 6 years ago
May or may not be related but even on the non-igb hardware I can set this into a link cycle/wan reconfigure loop by doing an edit/save/apply on the WAN, and that didn't happen on previous releases.
Updated by Jim Pingle over 6 years ago
This may end up being the same root cause as #8507, dhclient in FreeBSD gained support for MTU, but setting MTU on e1000 causes the link to cycle. dhclient was altered to not exit on link drop, but that doesn't help our other link processing scripts, which fall into a never-ending loop. See #8507#note-2
Updated by Jim Pingle over 6 years ago
- Status changed from Confirmed to Feedback
Updated by Jim Pingle over 6 years ago
- Status changed from Feedback to Duplicate