Bug #2257
closedMultiple related bugs, apparent causes and supporting/disagnostic data (please change bug report title as needed)
0%
Description
Around 00.50 today I modified my MTU and saved config, making no other changes. Multiple strange issues occurred which suggest both an interface bug and untrapped error conditions in the connection logic. Hopefully easy enough to verify and fix.
- In some circumstances defining an MTU for a WAN link can cause other unrelated WAN config such as bindings to be silently "lost"/deleted on saving. (crossref issue #2067)
- A knock-on effect is the relevant interface immediately dies, and attempts to bring it up fail abnormally. The system log shows that interfaces_bring_up() fails due to "no variable defined". Pfsense doesn't seem to trap the failure to create an interface and just continues trying the rest of the connection sequence.
- This shows in two ways.
- PPP log shows numerous connection attempts, all failing due to "Skipping link XXX with undefined interface". The user isn't notified at this stage and the ticker doesn't report the failure to define the interface (which might suggest an interface config problem or system issue).
- Instead of stopping with an error when expected interfaces can't be created, pfsense continues with presenting pfctl with filter rules to load. Pfctl interprets a config that sets rules on a non-existant interface as a syntax issue.
- The user is explicitly - but misleadingly - notified that filtering failed to start "due to a syntax error in the rules", but the "incorrect" line numbers point to innocuous system generated rules that worked before.
Data for checking and fixing, including timestamps, follows. Config available if it's not enough.
I set an MTU at 00:50:02. The visible symptom was that on saving an MTU the gateway immediately went down and the alert ticker quickly reported pfctl had identified a syntax error in the filter rules and was unable to load them. I fixed the config at 01:18:08.
Filter reload status display (ticker showed the same):
There were error(s) loading the rules: /tmp/rules.debug:127: syntax error/tmp/rules.debug:128: syntax error pfctl: Syntax error in config file: pf rules not loaded - The line in question reads [127]: block in log quick on { pppoe0 } proto { tcp udp } from !/ to any port $Always_blocked_ports label "USER_RULE: Incoming to blocked ports" ...
I had taken a backup of /tmp/rules.debugf less than an hour earlier while everything was good, which made clear the relevant line was innocuous, it read:
block in log quick on { pppoe0 } reply-to ( pppoe0 ***<ISP's gateway IP>***) proto { tcp udp } from !***<RouterPublicIP>***/32 to any port $Always_blocked_ports
Relevant system log at time of rule change (irrelevant lines such as rrd stripped out):
00:52:34 php: : The command '/sbin/pfctl -o basic -f /tmp/rules.debug' returned exit code '1', the output was '/tmp/rules.debug:127: syntax error /tmp/rules.debug:128: syntax error pfctl: Syntax error in config file: pf rules not loaded' 00:51:24 apinger: No usable targets found, exiting 00:51:24 apinger: Starting Alarm Pinger, apinger(***) 00:51:24 check_reload_status: Reloading filter 00:51:23 apinger: Exiting on signal 15. 00:51:22 php: : Backtrace: Array 00:51:22 php: : interfaces_bring_up() was called but no variable defined. 00:51:21 check_reload_status: updating dyndns wan 00:51:21 dhcpd: Copyright 2004-2011 Internet Systems Consortium. [etc] 00:51:21 dhcpd: Internet Systems Consortium DHCP Server *** 00:51:17 check_reload_status: Rewriting resolv.conf 00:51:17 php: /interfaces.php: Backtrace: Array 00:51:17 php: /interfaces.php: interfaces_bring_up() was called but no variable defined. 00:51:17 check_reload_status: Configuring interface wan 00:50:03 check_reload_status: Syncing firewall 00:31:35 check_reload_status: Syncing firewall 00:30:20 php: /index.php: Successful webConfigurator login for user '***' from ***** 00:30:20 php: /index.php: Successful webConfigurator login for user '***' from ***** 23:15:13 php: /diag_system_activity.php: Session timed out for user '***' from: ***** 21:42:08 sshlockout[****]: **** 21:42:08 php: /diag_logs.php: Successful webConfigurator login for user '***' from ***** 21:42:08 php: /diag_logs.php: Successful webConfigurator login for user '***' from ***** 21:15:00 check_reload_status: Reloading filter 21:14:50 apinger: alarm canceled: WAN(*****) *** WANdelay *** 21:14:31 check_reload_status: Reloading filter 21:14:21 apinger: ALARM: WAN(*****) *** WANdelay *** 20:00:06 check_reload_status: Reloading filter 20:00:06 check_reload_status: Syncing firewall 20:00:05 check_reload_status: Syncing firewall 19:38:41 check_reload_status: Reloading filter 19:38:39 php: : IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. 19:38:22 check_reload_status: Reloading filter 19:38:22 check_reload_status: Reloading filter 19:38:20 php: : Starting ***<other package>*** 19:38:15 check_reload_status: Syncing firewall [reloaded/resynced several times, possibly due to user browser action] 19:38:14 check_reload_status: Reloading filter 19:37:16 check_reload_status: Syncing firewall 19:37:06 check_reload_status: Reloading filter 19:37:05 check_reload_status: Syncing firewall 19:37:04 check_reload_status: Syncing firewall 19:35:58 check_reload_status: Reloading filter 19:35:58 check_reload_status: Syncing firewall 19:34:52 check_reload_status: Reloading filter 19:34:52 check_reload_status: Syncing firewall 19:33:26 php: : Restarting/Starting all packages. 19:33:08 check_reload_status: Restarting ipsec tunnels 19:31:24 dhcpd: For info, please visit https://www.isc.org/software/dhcp/ 19:31:24 dhcpd: All rights reserved. 19:31:24 dhcpd: Copyright 2004-2011 Internet Systems Consortium.
PPP log contained numerous copies of:
01:06:45 ppp: [wan_link0] Link: DOWN event 01:06:45 ppp: [wan_link0] PPPoE node for link is not initialized 01:06:45 ppp: [wan_link0] PPPoE: Skipping link "wan_link0" with undefined interface 01:06:45 ppp: [wan_link0] Link: reconnection attempt 375 01:06:41 ppp: [wan_link0] Link: reconnection attempt 375 in 4 seconds 01:06:41 ppp: [wan_link0] LCP: Down event 01:06:41 ppp: [wan_link0] Link: DOWN event 01:06:41 ppp: [wan_link0] PPPoE node for link is not initialized 01:06:41 ppp: [wan_link0] PPPoE: Skipping link "wan_link0" with undefined interface 01:06:41 ppp: [wan_link0] Link: reconnection attempt 374 01:06:37 ppp: [wan_link0] Link: reconnection attempt 374 in 4 seconds
Diff for the config change at 00:50 (some stuff blanked):
Configuration diff from *** 00:31:34 to *** 00:50:02 --- /conf/backup/config-*****.xml *** 00:50:02.000000000 +0000 +++ /conf/config.xml *** 00:50:02.000000000 +0000 @@ *** @@ <descr><![CDATA[WAN]]></descr> - <ipaddr>pppoe</ipaddr> + <ipaddr>pppoe</ipaddr> + <mtu>***</mtu> <<--------------- intended @@ *** @@ <type>pppoe</type> <if>pppoe0</if> - <ports>****</ports> <<--------------- not intended/ + <ports/> <<--------------- silent <username>****</username> <password>****</password> - <provider/> <ondemand/> + <provider/> </ppp>
Although I only changed MTU (leaving "ondemand" unchanged), the config had also made an unrequested change from <ports>****</ports> to <ports/>. When I changed this back and manually clicked "reconnect" it immediately all worked again, although the interface had an unusual "ng" label I'm not sure if that's normal or not. PPP log:
01:19:58 ppp: [wan_link0] *** 01:19:58 ppp: [wan_link0] PROTOCOMP 01:19:58 ppp: [wan_link0] LCP: SendConfigReq #1 01:19:58 ppp: [wan_link0] LCP: state change Starting --> Req-Sent 01:19:58 ppp: [wan_link0] LCP: Up event 01:19:58 ppp: [wan_link0] Link: UP event 01:19:58 ppp: [wan_link0] PPPoE: connection successful 01:19:58 ppp: PPPoE: rec'd ACNAME **** 01:19:58 ppp: [wan_link0] PPPoE: Connecting to '' 01:19:58 ppp: [wan_link0] LCP: LayerStart 01:19:58 ppp: [wan_link0] LCP: state change Initial --> Starting 01:19:58 ppp: [wan_link0] LCP: Open event 01:19:58 ppp: [wan_link0] Link: OPEN event 01:19:58 ppp: [wan] Bundle: Interface ng* created 01:19:58 ppp: web: web is not running 01:19:58 ppp: last message repeated 2 times 01:19:57 ppp: process **** terminated 01:19:57 ppp: [wan_link0] Link: Shutdown 01:19:57 ppp: [wan] Bundle: Shutdown 01:19:55 ppp: [wan] IPCP: Close event 01:19:55 ppp: [wan] IFACE: Close event 01:19:55 ppp: caught fatal signal term 01:19:55 ppp: waiting for process **** to die... 01:19:55 ppp: process **** started, version **** 01:19:55 ppp: 01:19:55 ppp: Multi-link PPP daemon for FreeBSD 01:19:54 ppp: [wan_link0] Link: reconnection attempt 698 in 4 seconds 01:19:54 ppp: [wan_link0] LCP: Down event 01:19:54 ppp: [wan_link0] Link: DOWN event 01:19:54 ppp: [wan_link0] PPPoE node for link is not initialized 01:19:54 ppp: [wan_link0] PPPoE: Skipping link "wan_link0" with undefined interface
System log showed once the interface was created, attempts to reload the rules didn't give rise to a "syntax error":
01:20:05 apinger: Starting Alarm Pinger, apinger(***) 01:20:05 check_reload_status: Reloading filter 01:20:04 php: : Removing static route for monitor *** and adding a new route through *** 01:20:04 php: : ROUTING: setting default route to *** 01:18:09 check_reload_status: Syncing firewall 00:52:34 php: : The command '/sbin/pfctl -o basic -f /tmp/rules.debug' returned exit code '1', the output was '/tmp/rules.debug:127: syntax error /tmp/rules.debug:128: syntax error pfctl: Syntax error in config file: pf rules not loaded'
Suspected underlying issues:
- The interface has a bug that in some cases changing MTU can cause interface <ports/> data to be silently deleted (almost certainly the same issue as #2067);
- Connection logic doesn't trap or test for error conditions:
- when the attempt to create/define an interface fails;
- when the kind of error returned by a PPP attempt suggests a system or config abnormality rather than usual connection issues (ie "try again 500 times" probably isn't the best response);
- when prerequisite setup/definitions for the filters don't exist in the system at the point pfctl is called to load rules which refer to them.
I'm basing this on the system log line "no variable defined" in relation to the pppoe interface it was trying to bring up, combined with a config diff that shows key data needed to bring up the interface got silently deleted.
Hopefully just needs verifying and fixing now
Updated by Stilez y almost 13 years ago
Same happens during system startup, with a ticker report of "rules not loaded" due to "syntax error". Relevant log (see timestamp 17:34:58):
[system up, no further issues] Mar 5 17:39:37 check_reload_status: Reloading filter Mar 5 17:39:35 php: : IPSEC: One or more IPsec tunnel endpoints has changed its IP. Refreshing. Mar 5 17:39:22 check_reload_status: Syncing firewall Mar 5 17:39:20 check_reload_status: Reloading filter Mar 5 17:39:20 check_reload_status: Syncing firewall Mar 5 17:39:13 php: : [pfblocker] pfblocker_xmlrpc_sync.php is starting. Mar 5 17:39:13 check_reload_status: Reloading filter Mar 5 17:39:13 check_reload_status: Syncing firewall Mar 5 17:39:07 php: : [pfblocker] pfblocker_xmlrpc_sync.php is starting. Mar 5 17:39:07 check_reload_status: Reloading filter Mar 5 17:39:07 check_reload_status: Syncing firewall Mar 5 17:39:00 php: : [pfblocker] pfblocker_xmlrpc_sync.php is starting. Mar 5 17:39:00 check_reload_status: Reloading filter Mar 5 17:39:00 check_reload_status: Syncing firewall Mar 5 17:38:54 php: : [pfblocker] pfblocker_xmlrpc_sync.php is starting. Mar 5 17:38:54 check_reload_status: Reloading filter Mar 5 17:38:54 check_reload_status: Syncing firewall Mar 5 17:38:28 php: : Restarting/Starting all packages. ... Mar 5 17:35:21 dhcpd: Internet Systems Consortium DHCP Server *** Mar 5 17:35:18 php: : ROUTING: setting default route to ***<ISP IP>*** Mar 5 17:35:13 php: : Removing static route for monitor *** and adding a new route through ***<ISP IP>*** Mar 5 17:34:58 php: : The command '/sbin/pfctl -o basic -f /tmp/rules.debug' returned exit code '1', the output was '/tmp/rules.debug:127: syntax error /tmp/rules.debug:128: syntax error pfctl: Syntax error in config file: pf rules not loaded' Mar 5 17:34:24 php: : Resyncing OpenVPN instances for interface WAN. Mar 5 17:34:18 apinger: Starting Alarm Pinger, apinger(***) Mar 5 17:34:18 check_reload_status: Reloading filter Mar 5 17:34:16 php: : Removing static route for monitor *** and adding a new route through ***<ISP other IP>*** Mar 5 17:34:16 php: : ROUTING: setting default route to ***<ISP IP>*** Mar 5 17:34:02 kernel: pflog0: promiscuous mode enabled Mar 5 17:34:02 kernel: Trying to mount root from ufs:/dev/ad0s1a ... Mar 5 17:34:02 kernel: ***@FreeBSD_8.0_pfSense_2.0-snaps.pfsense.org:/usr/obj./usr/pfSensesrc /src/sys/pfSense_SMP.8 i386
Updated by Warren Baker almost 13 years ago
The pf errors is probably related to the pfblocker package. Uninstall that (if you can) and see what the results are.
Updated by Stilez y almost 13 years ago
It looks like 'a bit of both'. Pfblock could be improved to avoid issues, but largely they are issues that it reveals rather than issues it creates. So far this is where it's at (subject to change and in no special order):
1. When pfsense restores packages it restores the config and related rules, then marks the package for download if needed. But until package files have actually been downloaded the router is left in an indeterminate state: the package xml, data, menus etc are used, but downloadable files of the package may not exist in the router. (This isn't specific to pfblock, it's generic).
RESOLUTION - pfsense should not rely on any config of a package - including tabs, menus, <config> sections etc, until all files needed to run the package are in place. After restore/reinstall they possibly won't be.
2. Restore config can easily create dns dependencies it can't handle. For example a user has disabled dns forwarder and uses tiny dns. Later they need to clean-install or replicate the router. On reboot, the new install expects to now download and reinstall named packages/updates, but to do so it's dependent on dns being in place (since packages/updates are currently accessed via url not ip: *.pfsense.org), and it's just restored a config that tells it to use an alternative dns handler which it can only get by dns lookup + download. Pfsense includes alternative dns handlers so it's to be expected they will be used and restored to.
RESOLUTION - pfsense should contain and use IPs for the main *.pfsense.org domains so that if a router dns service is inaccessible for any reason following a restore, it's not hung up trying to get updates/packages that the config requires to complete restoration. Moreso since it can lead to circularity (the needed files may include the dns resolver of the new-restored config).
OTHER - this is made worse by issue 3: System -> Packages would also be unable to get its update list and usual .php/.inc files used for package interface for 'active' packages in the config don't yet exist, so the user has no access to any means to modify, disable, update, and in some cases reconfigure packages.
3. 'System -> Packages' relies on the initial list it obtains from updates.pfsense.org. When restoring a config, if this url or dns isn't available package manager is unable to be used for uninstall purposes.
4. There is an issue around unresolvable rules that pfblock reveals but doesn't create. Specifically a restore of a config that included pfblock will have rules that reference tables - but when the config is restored those tables aren't automatically being recreated. This is inconsistent, a list of networks in a list would be restored, so should a list in a table. (Generic, not just pfblock, any alias or rule can potentially reference a table).
RESOLUTION: Downloaded config.xml should provide an option to include active table data, similar to the option for RRD data, so that upon restore the router knows what tables to create and (provisionally at least) what initial data is best to put in them even before packages are loaded.
Possibly more robust solution: downloaded config.xml should provide an option to include the current running pf.conf and its tables (as gz->encode64?), which has the massive advantage of having no dependencies at all on any other data. If the config is later restored, even to bare metal, the exact state of pfctl at the point of backup is also restored (which is presumably guaranteed valid, complete, reliable, and gets the router working as intended whatever the package/update state).
Crudely disabling any rule that references a table on restoring a config (filter.inc: "unresolvable alias") has too much potential for problems such as misrouting, non-connectivity, or insecurity. The router may need some tables recreated and rules working as a precondition for wan access, or to download those package files which will resolve the circularity.
5. I suspect pfsense isn't trapping some errors and dependencies closely enough at clean reinstall, config restore or rules reload, but nothing specific yet on this. Gut sense only, errors in system log that look like the consequences of earlier issues that hadn't been trapped. The robustness of config restore, package restore, and rules reload is possibly worth a look.
6. Also tentatively to check - is there possibly an issue in pfblock that it creates a url rule pointing to 127.0.0.1:443/pfblock.php?data but for some reason there will be times that url can't be reached or no data is held when it is reached, or when the alias or table aren't yet created or data is not yet consistent. Still hoping to narrow this down. Would a url rule (by any package) sourced from https://<loopback>/file.php be okay within pfctl's boot-time rules or is that a circularity?
I'm still working on this (it's a great introduction to FreeBSD!) and will try to find specifics, can you keep this bug open a while?
Updated by Chris Buechler almost 9 years ago
- Status changed from New to Closed
I see multiple issues here that have since been fixed. the rest isn't clear and likely would be completely different at this point. Stilez if you're still seeing any issues here, let's start over with a clean report on 2.3.