Project

General

Profile

Bug #9484

With proper timing on boot dhclient won't be started for WAN without manual intervention

Added by Tomasz K. over 1 year ago. Updated over 1 year ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Unknown
Target version:
-
Start date:
04/25/2019
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.4.4_2
Affected Architecture:
All

Description

My setup
  • Pfsense WAN (igb0) connected directly to ISP modem (configured as bridge)
  • Pfesnse LAN (igb1 - with a few VLANs) connected to managed switch - not important for this problem
  • WAN is configured to get IP from DHCP
  • Modem is Compal ch7465lg from UPC
Short description of the problem
  • if I power off both pfSense and modem and after that start them at the same time dhclient won't start properly (because there is no link?)
  • on dashboard WAN will get n/a instead of IP and gateway will get Unknown status.
  • after that if I pull out the cable from igb0 and plug it back (or just click renew in "Interfaces") it will get IP immediately from DHCP
Probably important details
  • pfSense starts really fast (about 20s)
  • on the other hand modem is not so fast
    • after turning it on it takes 40s to get light next to ethernet port (link up?)... so dhclient for WAN is run before link up
    • after that I have to wait next 50s to see blinking light next to ethernet port - indicating any communication
    • traffic mentioned above is only arp who has (if I remember correctly)... nothing from pfSense WAN
Logs indicating problem
  • Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1
    Apr 18 17:41:05 pfSense kernel: igb1: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.50: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.200: link state changed to UP
    Apr 18 17:41:05 pfSense kernel: igb1.10: link state changed to UP
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.50
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.200
    Apr 18 17:41:05 pfSense check_reload_status: Linkup starting igb1.10
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: The command '/sbin/dhclient -c /var/etc/dhclient_wan.conf igb0 > /tmp/igb0_output 2> /tmp/igb0_error_output' returned exit code '1', the output was '' 
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: Resyncing OpenVPN instances.
    Apr 18 17:41:13 pfSense kernel: pflog0: promiscuous mode enabled
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: Gateway, none 'available' for inet, use the first one configured. 'WAN_DHCP'
    Apr 18 17:41:13 pfSense kernel: DUMMYNET 0 with IPv6 initialized (100409)
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched FIFO loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched QFQ loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched RR loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched WF2Q+ loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched PRIO loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched FQ_CODEL loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_sched dn_sched FQ_PIE loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_aqm dn_aqm CODEL loaded
    Apr 18 17:41:13 pfSense kernel: load_dn_aqm dn_aqm PIE loaded
    Apr 18 17:41:13 pfSense php-cgi: rc.bootup: Gateway, none 'available' for inet6, use the first one configured. ''
    Apr 18 17:41:14 pfSense php-cgi: rc.bootup: sync unbound done.
    Apr 18 17:41:15 pfSense php-cgi: rc.bootup: NTPD is starting up.
    Apr 18 17:41:15 pfSense check_reload_status: Updating all dyndns
    Apr 18 17:41:15 pfSense check_reload_status: Linkup starting igb0
    Apr 18 17:41:15 pfSense kernel: igb0: link state changed to UP
    
  • and in /tmp/igb0_output I get this
    igb0: no link .............. giving up
    

My workaround
Put switch between pfSense and modem so there is link when dhclient is started on boot. With this in logs I get (proper last line instead of failing dhclient)

Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1
Apr 18 17:48:29 pfSense kernel: igb1: link state changed to UP
Apr 18 17:48:29 pfSense kernel: igb1.50: link state changed to UP
Apr 18 17:48:29 pfSense kernel: igb1.200: link state changed to UP
Apr 18 17:48:29 pfSense kernel: igb1.10: link state changed to UP
Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1.50
Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1.200
Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb1.10
Apr 18 17:48:29 pfSense check_reload_status: Linkup starting igb0

and /tmp/igb0_output starts with

igb0: no link ...... got link
dhclient: PREINIT
DHCPREQUEST on igb0 to 255.255.255.255 port 67

Logs
Logs mentioned above are from files that I attach to this issue
  • Logs showing problem when pfsense is connected directly to the modem
    • dhcp_log_start_without_net
    • sytem_log_start_without_net
  • Logs gathered when pfsense is connected through switch (situation when all is working properly)
    • dhcp_log_start_through_switch
    • sytem_log_start_through_switch

Forum post
https://forum.netgate.com/topic/142695/dhclient-doesn-t-work-after-boot-up-for-wan-conected-to-modem - forum post related with this problem. Mostly the same as description above. In forum post I also mention some code analysis of dhclient.c but probably it won't be helpful.

My guess
As I mention in https://forum.netgate.com/topic/142695/dhclient-doesn-t-work-after-boot-up-for-wan-conected-to-modem/6 maybe there was some timing problem with handling of link up by check_reload_status or maybe devd haven't used information from pfSense-devd.conf. It is first time I look into pfsense code, so those are just guesses.

dhcp_log_start_through_switch (5.74 KB) dhcp_log_start_through_switch Tomasz K., 04/25/2019 01:02 PM
dhcp_log_start_without_net (4.8 KB) dhcp_log_start_without_net Tomasz K., 04/25/2019 01:02 PM
sytem_log_start_without_net (20 KB) sytem_log_start_without_net Tomasz K., 04/25/2019 01:02 PM
sytem_log_start_through_switch (20.9 KB) sytem_log_start_through_switch Tomasz K., 04/25/2019 01:02 PM

History

#1 Updated by Tomasz K. over 1 year ago

It looks that with version 2.4.4-RELEASE-p3 problem no longer exists.

#2 Updated by Jim Pingle over 1 year ago

  • Status changed from New to Closed

#3 Updated by Tomasz K. over 1 year ago

Short note for the future reference. It still works, but I found out that it won't work for default protocol timings in advanced options of WAN interface. I have always had 900 for timeout.

Also available in: Atom PDF