Project

General

Profile

Bug #1493

pf blocks all traffic following filter reload.

Added by Aaron Roberts about 8 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
05/04/2011
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.0
Affected Architecture:

Description

Version: 2.0-RC1 (i386) built on Tue Apr 19 23:03:17 EDT 2011

Hardware: /usr/libexec/qemu-kvm -S -M rhel5.4.0 -cpu qemu32 -m 128 -smp 1 -name bd32a054-71ae-11e0-b962-1cc1def3fdd0 -uuid bd32a054-71ae-11e0-b962-1cc1def3fdd0 -no-kvm-pit-reinjection -monitor pty -pidfile /var/run/libvirt/qemu//bd32a054-71ae-11e0-b962-1cc1def3fdd0.pid -boot cd -drive file=/var/lib/xen/images/bd32a054-71ae-11e0-b962-1cc1def3fdd0/d0.qcow,if=ide,index=0,boot=on -net nic,macaddr=00:16:18:69:6b:50,vlan=0 -net tap,fd=33,script=,vlan=0,ifname=vnet9 -net nic,macaddr=00:16:f7:ca:d7:bb,vlan=1 -net tap,fd=34,script=,vlan=1,ifname=vnet12 -serial pty -parallel none -usb -usbdevice tablet -vnc 0.0.0.0:8,password -k en-gb

Occasionally, after making changes to the NAT or Firewall rules, and clicking the "Apply Changes" button on the web GUI, the filter reload causes pf to block all traffic. Either rebooting from the console, or running /etc/rc.filter_configure from shell resolves the issue.

We have seen this problem (or very similar ones) since at least 2.0BETA4 (we did not test earlier versions). It would appear that the problem occurs far less frequently in RC1 but, it still occurs.

broken_state.tar.gz (191 KB) broken_state.tar.gz Aaron Roberts, 05/07/2011 07:51 AM

History

#1 Updated by Chris Buechler about 8 years ago

  • Status changed from New to Feedback
  • Affected Version set to 2.0

not enough info to do anything with this. Definitely not a universal issue, maybe something specific to KVM or something else you're doing. Need more info.

#2 Updated by Aaron Roberts about 8 years ago

Hi, can you let me know what information would be useful?

At present, the only system logs immediately preceding failure are:

May 5 17:15:22 pfsense check_reload_status: syncing firewall
May 5 17:15:27 pfsense check_reload_status: reloading filter

When pfSense has hit this issue, I connect to the console and press 9 for "pfTop". There I see all my traffic being blocked. Pressing 8 for "Shell" and executing /etc/rc.filter_configure resolves the problem instantly.

The issue only pops up occasionally (more often for customers, some how..) so, I am writing scripts to repeatedly attempt various tasks (just reload, add NAT then reload, add rule then reload etc..) to try and narrow it down a bit.

Are there debug flags I can set to get more information from pfSense, during this process?

Thanks,
Aaron

#3 Updated by Chris Buechler about 8 years ago

pftop shows only traffic being passed, not blocked. check /tmp/rules.debug and the loaded rulesets and other info in status.php

#4 Updated by Aaron Roberts about 8 years ago

Oops, I lied.. I see the blocked packets in "Filter Logs", sorry.

Please find attached a tarball of /tmp from a pfSense box in the "broken" state. The tarball also includes /var/log/filter.log and /var/log/system.log and /conf/config.xml along with output from some pfctl commands. I have deleted lines from config.xml containing passwords and the private key for the web configurator.

The failure occurred immediately after clicking the "Apply Changes" button at 11:02am 6th May. I was repeatedly adding and removing a NAT rule to forward TCP port 8888 on the WAN IP address to 192.168.254, in order to provoke the problem.

To my eyes, it looks like the filter is loaded with a partial ruleset, which blocks everything except traffic originating from the pfSense machine itself. As I said, just reloading the filter resolves the issue.

Thanks for your time and a great product. Please let me know how best to help you solve the issue - I'm pretty new to this.

Aaron

#5 Updated by Chris Buechler about 8 years ago

Somehow it's skipping the entire user generated rules section. The only way that entire section is skipped is if

if(isset($config['filter']['rule'])) 

doesn't evaluate to true, i.e. there are no <filter><rule> entries in your config. Which you clearly have. And your config is about as simple as one can possibly be. If this were a general issue with that config we'd have 2000 people screaming about it, and widely see it ourselves, and this is the first I've ever heard of it. You have any source customizations, doing anything to any part of the system other than modifying normally in the web interface, anything so atypical that no one else would see it?

Try the same config on a clean install with no packages.

#6 Updated by Aaron Roberts about 8 years ago

This system has 1 package listed in "Installed Packages":

Package Name: RRD Summary
Package Version: 1.1

I'll post again, once I have thoroughly tested an install with no packages.

Thanks,
Aaron

#7 Updated by Aaron Roberts about 8 years ago

Hi,
I have tested with a vanilla install of pfSense.

I consistently encounter this issue. I have tried i386 pfSense, amd64 pfSense and 64 bit and 32 bit KVM virtual hardware (including 32 bit pfSense on 64 bit hardware).

I have also toggled APIC, ACPI and PAE emulation on the virtual hardware.
I have tried with 1 and 2 CPUs.
I have tried enabling and disabling ACPI within pfSense.
I have tried installing the virtio drivers and enabling virtio on the virtual hardware.
I have tested using kern.hz values of 10, 100, 1000

The only changes made to this pfSense install were to add the line:
$nocsrf = "true";
to a couple of the web scripts, to make it easier to use a script to trigger the fault.

Any assistance you can give would be greatly appreciated.

Aaron

#8 Updated by Markus Schlager about 8 years ago

Similar problem here:

Hardware: Fujitsu Primergy; VMWare VSphere

pfSense 2.0-RC3 (amd64)
built on Mon Jul 4 16:49:48 EDT 2011

'Reload Filter' breaks any ssh-connection. This happens any time I click the button at status_filter_reload.php and every 15 minutes due to

0,15,30,45 * * * * root /etc/rc.filter_configure_sync

in /etc/crontab.

The only information I can capture are log entries of an openVPN-client (pfSense is listening on port 1194):

Jul 18 09:30:02 MY_CLIENT ovpn-client[29701]: TCP/UDP: Incoming packet rejected from PFSENSE.IP.ADD.RESS:18080[2], expected peer address: PFSENSE.IP.ADD.RESS:1194 (allow this incoming source address/port by removing --remote or adding --float)

No idea why the port changes.

#9 Updated by Markus Schlager about 8 years ago

I found the solution for my problem with broken states on filter reload:

I had to activate 'States' under System -> Advanced -> Miscellaneous -> Gateway Monitoring.

This link was helpful: [[http://forum.pfsense.org/index.php/topic,34905.0.html]]

#10 Updated by Bill McGonigle over 7 years ago

The same solution worked for me. What I was seeing was broken ssh connections on every filter reload, any in-flight webconfigurator requests would stop loading, etc.

The default behavior seems wrong. Maybe the text should advise to always turn this on if you're using multiple gateways? Or perhaps just assume this behavior for only a filter reload but allow the feature to work in the event of a real gateway failure?

#11 Updated by Jim Pingle about 7 years ago

  • Status changed from Feedback to Resolved

#12 Updated by Daniel Milazar over 4 years ago

I have that same bug in the following pfSense version: 2.2-RELEASE (i386)
built on Thu Jan 22 14:04:25 CST 2015
FreeBSD 10.1-RELEASE-p4

After changing and applying a port forwarding roule the firewall ist blocking all traffic. If i call /etc/rc.filter_configure from command line, everything is fine again. Is there a fix or a workaround for that bug?

#13 Updated by andres g over 4 years ago

I can confirm that I am experiencing the same with 2.2-Release (AMD64) version.

Any updates on this?

#14 Updated by Chris Buechler over 4 years ago

nothing you're encountering today has any relation to this issue. I suspect any such issues on 2.2 have the same root cause as #4445. It appeared to be Hyper-V specific, but the same root issue could apply elsewhere, it's potentially more hardware/hypervisor-specific than specific to just Hyper-V (actually seemed to be more its block driver).

#15 Updated by andres g over 4 years ago

Chris Buechler wrote:

nothing you're encountering today has any relation to this issue. I suspect any such issues on 2.2 have the same root cause as #4445. It appeared to be Hyper-V specific, but the same root issue could apply elsewhere, it's potentially more hardware/hypervisor-specific than specific to just Hyper-V (actually seemed to be more its block driver).

Hi Chris, reading #4445 it seems that you are right because I am also using hyper-v, will follow that bug now. Many thanks

Also available in: Atom PDF