Bug #4094
closedGateway Status can report Online when gateway is waiting for DHCP
100%
Description
Example system: 2 WANs, both DHCP, that uplink to 2 different ISPs (WAN and OPT1) (called WAN_DHCP interface WANGENERAL and OPT1_DHCP interface OPT1SUBISU in the screenshots)
WAN has monitor IP 8.8.4.4
OPT1 has monitor IP 8.8.8.8
The cable for OPT1 goes to a switch that then has a cable up to a rooftop ISP device that does the uplink. Cable to rooftop is unplugged (simulating a fault). pfSense OPT1 has a physical connection to the switch. Boot like this and OPT1 is waiting/trying to get DHCP.
WAN comes up fine, getting DHCP. A route to WAN monitor IP 8.8.4.4 is added through WAN gateway (10.172.1.1 learned from DHCP) - good.
There is no specific route to OPT1 monitor IP 8.8.8.8 because there is no gateway for OPT1 known yet.
Enable default gateway switching is off, OPT1 is the default gateway, but somehow there is a default route through WAN gateway 10.172.1.1 - that's handy but I didn't expect it to happen.
So OPT1 monitor IP 8.8.8.8 can be reached happily out WAN. apinger is happily monitoring it and getting response, so it considers OPT1 to be Online. Thus the misleading screen shots of gateway status that show both gateways online, even though the OPT1 gateway IP still says "dynamic".
My failover rules that prioritize some traffic out WAN and some out OPT1 are doing something weird - for example I am coming from a client on OPT2WIFI. Here are the rules generated for that:--------
pass in quick on $OPT2WIFI inet from 10.49.212.250/22 to $INF_subnets tracker 1397570125 keep state label "USER_RULE: Allow packets to INF subnets"
- returning at dst == "/" label "USER_RULE: Allow packets to Subisu WAN local subnet"
pass in quick on $OPT2WIFI inet from 10.49.212.250/22 to 10.172.1.0/24 tracker 1418222712 keep state label "USER_RULE: Allow packets to WAN local subnet" - rule Subisu Internal always to Subisu WAN disabled because gateway OPT1_DHCP is down label "USER_RULE: Subisu Internal always to Subisu WAN"
pass in quick on $OPT2WIFI $GWWAN1 inet from 10.49.212.250/22 to $INFemail tracker 1397451985 keep state label "USER_RULE: INF email special" - rule Allow all on WiFi disabled because gateway InetGeneral is down label "USER_RULE: Allow all on WiFi"
--------
and the gateway groups the system has decided it will define:
-------- - Gateways
GWWAN_DHCP = " route-to ( vr0_vlan70 10.172.1.1 ) "
GWVPNclients = " route-to { ( vr0_vlan70 10.172.1.1 ) } "
GWWAN1 = " route-to { ( vr0_vlan70 10.172.1.1 ) } "
--------
Somehow it has decided that the InetGeneral Gateway Group is down - but that has OPT1 tier 1 (which is down) and WAN tier 2, which is up - so why is InetGeneral considered down?
As a result, the last rule quoted above "rule Allow all on WiFi disabled because gateway InetGeneral is down" has been disabled, and so the leftover traffic that was being directed to InetGeneral is going nowhere - most internet access does not work.
But looking at all the green on the dashboard, a network admin could easily miss the fact that OPT1 is down.
2 problems I see here:
1) The gateway group that contains OPT1 (waiting for DHCP) and WAN (got DHCP already) is being considered down, and rules using it are being disabled.
2) The Gateways Status and dashboard Gateways Widget are showing green Online for OPT1 when it does not even have an IP address yet.
I suspect that I could generate a scenario like this on 2.1.n also - never done this level of testing before.
As soon as I plug in the cable to the rooftop device on OPT1 and it gets an IP address (even if the ISP behind it is down) the system starts correctly monitoring OPT1 monitor IP 8.8.8.8, OPT1 shows as offline, all the gateway group do their thing and internet comes back for all sites, failing everything to WAN. So the problems are all when OPT1 has not got its DHCP address yet.
Files