Project

General

Profile

Bug #5476

Does not appear possible to use policy routing for traffic originating from the firewall (self)

Added by Luke Hamburg over 3 years ago. Updated 6 months ago.

Status:
Needs Patch
Priority:
Normal
Assignee:
-
Category:
Routing
Target version:
-
Start date:
11/17/2015
Due date:
% Done:

0%

Estimated time:
Affected Version:
All
Affected Architecture:

Description

Summary of the issue:

- despite https://doc.pfsense.org/index.php/What_are_Floating_Rules stating that "Floating Rules can: Filter traffic from the firewall itself", it does not seem to work in real world testing
- I have replicated this with multiple test units, both hardware and virtual machines and was able to replicate this behavior on 2.2.4 and 2.2.5 (have not tested 2.3.....)

Repro:

- set up a pfsense 2.2.5 with 2 WAN uplinks
- set up a gateway group called "Failover" containing these in a Tier1->Tier2 failover arrangement
- create a floating rule with the following settings:
action: pass
interface: (none selected, or ALL selected)
direction: out
tcp ver: ipv4
protocol: any
source: This Firewall (self)
destination: any
under Advanced->Gateway choose the "Failover" routing group
- edit the "default allow" rule (last rule on the LAN interface) to make it also use the Failover routing group (Advanced->Gateway)
- ssh or console to the firewall
- ping 8.8.8.8
- from a computer attached to the pfsense LAN, set up a 2nd ping to 8.8.8.8
- pull the WAN1 connection
- pings on the pfsense console will timeout and NOT recover
- computer attached to the LAN will recover and traffic will fail over to the 2nd Tier gateway

Some reasons this is important are:

-when these multi-wan failures occur, pfSense is unable to do DNS lookups or send SMTP emails due to lack of a working gateway
-can't use the "allow default gateway switching" option because sometimes a gateway is for internal routing, not a true "WAN" and does not have internet access
-I believe on the forums others have noted that this bug also affects Squid as that proxy's traffic through the firewall and thus makes it susceptible to this routing bug

More details and discussion at https://forum.pfsense.org/index.php?topic=102053.0

History

#1 Updated by Luke Hamburg over 3 years ago

Would it be worth testing the 2.3 snapshots? Has anything changed in those that might affect this behavior?

#2 Updated by Chris Buechler over 3 years ago

  • Affected Version changed from 2.2.5 to All
  • Affected Architecture deleted (amd64)

It does work as described, there are potential complications with policy routing though.

Luke Hamburg wrote:

Would it be worth testing the 2.3 snapshots? Has anything changed in those that might affect this behavior?

Yes, the base OS is newer, which may have an impact.

#3 Updated by Luke Hamburg over 3 years ago

Thanks Chris. As described above I have detailed my settings & method of testing. I am not sure how I should be doing things but "as is" it doesn't really seem possible to use multi-wan on a system that has non-internet facing gateways defined and still have a fully functioning self contained system including alerts, squid etc if the PBR is not working for floating rules. I will test with 2.3 to see if there are differences there. What document should I be referring to for better info on how to configure this? Also referenced the forum thread(s) where others are experiencing the same issue.

#4 Updated by Luke Hamburg over 3 years ago

I just installed 2.3-BETA (amd64) built on Fri Jan 15 13:30:46 CST 2016 from scratch and set up a test scenario with 1 LAN and 2 WAN connections (WAN+OPT1). Set up a policy on Floating Rules to route outbound HTTP traffic originating from The Firewall Itself via OPT1. When I tested this by logging in from the console and executing "fetch -qo - http://ipecho.net/plain" to check what IP got returned, I instead got a timeout. Am I doing something drastically wrong here? I have still never seen any examples anywhere on the forum or elsewhere of a config on either 2.2.x or 2.3 where someone has PBR for Firewall(self) working in a multiwan scenario. Help!

#5 Updated by Jim Pingle over 3 years ago

Be sure to account for the necessary outbound NAT in that scenario. The system selects the outbound address according to the OS routing table, if it's forced out another way in pf after that step then it'll need NAT out WAN2 for traffic with a source of the WAN1 IP address, for example. Some traffic captures should confirm if that's the case.

#6 Updated by Luke Hamburg over 3 years ago

Alright, I'll keep banging away at it. Some screenshots of the outbound NAT setup you're talking about would be immensely helpful! For the outbound NAT, should the "source" network be /32 of the WAN1 interface e.g. 1.2.3.4/32?

#7 Updated by Всеволод Старовойтов over 3 years ago

Jim Pingle wrote:

Be sure to account for the necessary outbound NAT in that scenario. The system selects the outbound address according to the OS routing table, if it's forced out another way in pf after that step then it'll need NAT out WAN2 for traffic with a source of the WAN1 IP address, for example. Some traffic captures should confirm if that's the case.

Somehow such Outbound NAT rules do not work in that scenario. I have a NAT rule on top that translates WAN1 IP (192.0.2.2) to WAN2 IP (198.51.100.2) on WAN2 interface. While outgoing packets are correctly rerouted to WAN2 gateway by the floating rule, their source IP is still 192.0.2.2 (WAN1 IP). Here is a part of the packets captuted on 198.51.100.1 (WAN2 uplink router):

15:03:45.041663 00:0c:29:96:2d:39 > 00:0c:29:e6:cc:a4, ethertype IPv4 (0x0800), length 87: (tos 0x0, ttl 64, id 32847, offset 0, flags [none], proto UDP (17), length 73)
192.0.2.2.34776 > 198.41.0.4.53: [udp sum ok] 30929% [1au] A? beta.pfsense.org. ar: . OPT UDPsize=4096 OK (45)
15:03:45.041690 00:0c:29:96:2d:39 > 00:0c:29:e6:cc:a4, ethertype IPv4 (0x0800), length 87: (tos 0x0, ttl 64, id 42563, offset 0, flags [none], proto UDP (17), length 73)
192.0.2.2.19800 > 192.5.5.241.53: [udp sum ok] 23665% [1au] AAAA? beta.pfsense.org. ar: . OPT UDPsize=4096 OK (45)
15:03:45.384369 00:0c:29:96:2d:39 > 00:0c:29:e6:cc:a4, ethertype IPv4 (0x0800), length 87: (tos 0x0, ttl 64, id 63330, offset 0, flags [none], proto UDP (17), length 73)
198.51.100.2.30591 > 198.51.100.1.53: [udp sum ok] 24070+ SRV? _http._tcp.beta.pfsense.org. (45)
15:03:45.384479 00:0c:29:e6:cc:a4 > 00:0c:29:96:2d:39, ethertype IPv4 (0x0800), length 133: (tos 0x0, ttl 64, id 7298, offset 0, flags [none], proto UDP (17), length 119)
198.51.100.1.53 > 198.51.100.2.30591: [bad udp cksum 0x54e1 -> 0x2e4e!] 24070 NXDomain q: SRV? _http._tcp.beta.pfsense.org. 0/1/0 ns: pfsense.org. SOA pfsense.org. webmaster.pfsense.org. 2015081008 3600 7200 1209600 3600 (91)
15:03:46.284237 00:0c:29:96:2d:39 > 00:0c:29:e6:cc:a4, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 53312, offset 0, flags [none], proto UDP (17), length 84)
192.0.2.2.14721 > 192.33.4.12.53: [udp sum ok] 42618% [1au] SRV? _http._tcp.beta.pfsense.org. ar: . OPT UDPsize=1472 OK (56)
15:03:46.504096 00:0c:29:96:2d:39 > 00:0c:29:e6:cc:a4, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 26319, offset 0, flags [DF], proto TCP (6), length 60)
192.0.2.2.14349 > 208.123.73.80.80: Flags [S], cksum 0x145e (correct), seq 1785540060, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 952402 ecr 0], length 0
15:03:46.794028 00:0c:29:96:2d:39 > 00:0c:29:e6:cc:a4, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 64, id 38078, offset 0, flags [DF], proto TCP (6), length 60)
192.0.2.2.14446 > 208.123.73.80.80: Flags [S], cksum 0xc7c4 (correct), seq 1791654037, win 65228, options [mss 1460,nop,wscale 7,sackOK,TS val 952692 ecr 0], length 0

I suggest that the problem is related to pf state processing. The old 2.0.x pf.c working code has something like that for outgoing policy routing:

1. Undo NAT
2. Unlink state
3. Reroute to the right interface

Somehow the p2 is missed in the modern code.

#8 Updated by Luke Hamburg over 3 years ago

I've just re-tested with the latest 2.3 snapshot as of Feb 8 2016. Still can't even come close to making this work. I'd really love to either see screenshots of a working setup, or see the documentation updated to reflect the reality that this doesn't seem possible.

https://doc.pfsense.org/index.php/What_are_Floating_Rules - "Floating Rules can: Filter traffic from the firewall itself"
https://doc.pfsense.org/index.php/Multi-WAN#Local_Services - "It is possible to load balance/failover services from the local firewall such as OpenVPN, DNS requests, and Squid. That is out of the scope of this document, but may be covered elsewhere."

#9 Updated by Jim Pingle over 3 years ago

  • Status changed from New to Needs Patch

I ran some tests here and it does not appear to function any longer. This is something that needs replicated with pf on FreeBSD and reported upstream for a solution.

I'll update the docs in the meantime. The first link is correct even if the behavior is as you state -- you can filter traffic (block, pass, match) from the firewall itself. The stated behavior is policy routing, not filtering. The second link does need to be amended, however.

#10 Updated by Luke Hamburg over 3 years ago

Ok thanks Jim at least I know i'm not crazy! I'm not familiar with reporting bugs upstream to FreeBSD. Would it be as simple as setting up the rules and then sending them the output of

 pfctl -sr 
and linking them to this ticket?

#11 Updated by Jim Pingle over 3 years ago

Wouldn't even need to link to this ticket. You would have to replicate it on a stock FreeBSD install though, preferably 11-CURRENT.

You'd need to configure it for multiple WANs and use a pf ruleset similar to the one you have now that exhibits the problem. Preferably one that simplifies the rules to only expose the problem case.

For more advice on that, post to the forum where a more in-depth conversation is possible.

#12 Updated by M B 6 months ago

Anything new on this or is this still an upstream issue?

Also available in: Atom PDF