Project

General

Profile

Actions

Regression #11805

closed

Port forward rules only function through the default gateway interface, ``reply-to`` does not work for Multi-WAN (CE Only)

Added by Jim Pingle 4 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Urgent
Category:
Rules / NAT
Target version:
Start date:
04/14/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Default
Affected Version:
2.5.1
Affected Architecture:
amd64

Description

Port forwards coming into the firewall from a non-default WAN are not working properly on CE version 2.5.1. This is similar to #11436 but now happening on CE only, not Plus 21.02.2.

Unlike before, there is no firewall log entry for the packet attempting to leave via the wrong path.

Packet capture on WAN2 shows the SYN arriving, but no response.

State table shows:

vmx3 tcp 127.0.0.1:22 (203.0.113.3:222) <- 172.21.32.79:60472       CLOSED:SYN_SENT
   [0 + 64240]  [2247652855 + 1]
   age 00:00:04, expires in 00:00:29, 3:5 pkts, 180:300 bytes, rule 158
   id: 0100000060774d99 creatorid: e2ca2a66

Rule 158 created the state, and it is:

@158(1617127544) pass in quick on vmx3 reply-to (vmx3 203.0.113.1) inet proto tcp from any to 127.0.0.1 port = ssh flags S/SA keep state label "USER_RULE: NAT Reply-to test WAN2" 
  [ Evaluations: 23443     Packets: 59        Bytes: 3540        States: 0     ]
  [ Inserted: pid 72469 State Creations: 4     ]

Contacting a service directly on WAN2, not via port forwarding, works.

Actions #1

Updated by Kristof Provost 4 months ago

I can't seem to reproduce this on my system, running 'pfSense 2.5.1-RELEASE (amd64) on pfSense'. Can you share your rules file (and perhaps the configuration file)?

Actions #2

Updated by Kristof Provost 4 months ago

Correction, I was testing it wrong, I can reproduce. I'd again forgotten to ensure my requests came from outside the WAN/WAN2 subnets.

Actions #3

Updated by Kristof Provost 4 months ago

I'm confident I have a fix ready. It's being reviewed & validated internally.

Actions #4

Updated by Kristof Provost 4 months ago

Patrick Clara: I cannot tell from that post if this is the same problem or not. It could plausibly be.

2.6.0 working matches what I'd expect from what I know the issue to be.

Actions #5

Updated by Rajil Saraswat 4 months ago

@Kristof, will there be a point release to fix this, or can a patch be applied to 2.5.1?

I guess a point release would take some time to roll out.

Actions #6

Updated by Jim Pingle 4 months ago

We have more than enough confirmation that it's a problem at this point, please refrain from commenting to that effect. I'll be cleaning up some of the older comments here that aren't adding anything substantial to the development process, since it makes finding relevant information on the issue difficult.

EDIT: 25 comments removed.

Actions #7

Updated by Luca De Andreis 4 months ago

I would just like to add that on a multi gateway firewall (typically, in my case, wan and mpls) there is a loss of the connection after 30 seconds if the connection request occurs not through the default gateway (for example if I reach an internal network segment to the firewall with a connection coming from the mpls gateway if the default gateway is wan). Loss of connection after 30 seconds (approximately) results in a reconnection of rdp or a freeze of ssh. Of course the same configuration works perfectly with the 245p1 version.

Actions #8

Updated by Reinaldo Alves Feitosa 4 months ago

I also have the same problem!

Actions #9

Updated by Kristof Provost 4 months ago

Adam Kuklycz wrote:

Now, with Jim removing a handful of comments saying they too have the issue, it gives the perception that this issue is a lot less and maybe not even related to the problem others are having, so while it does clean up the issue it makes the end user less certain that it affects them or not. It also makes me feel uncomfortable, thinking that this problem may not get addressed anytime in the near future. The issue hasn't been assigned a priority...nothing...

Jim removed those because "Me too!" does not contribute anything to the bug report, but obscures relevant information.
The fix has been committed and will be included in the next release. I do not know when that will be.

Actions #10

Updated by Jim Pingle 3 months ago

  • Status changed from New to Feedback
  • Priority changed from Normal to Urgent
  • Target version changed from CE-Next to 2.6.0

I cleaned up the comments again. Please do not comment unless you have substantial new information. Otherwise, keep the discussion on the forum. We are well aware of the impact the issue has, and if you look at my previous comment I've noted how many comments were removed. Thus keeping a lot of "me too" posts here only serves to obscure meaningful development discussion.

This is not a configuration or PHP code issue, but an issue in the kernel, so it is not possible to patch it in-place and an upgrade is required. If you want to test it, try a 2.6.0 snapshot.

Actions #11

Updated by Emanuel Birkmann 3 months ago

I don't know if this is substantial new information, especially if a fix is already under development. But what I figured out and what seems to be nowhere reported so far - as far as I can tell - is the fact that in a packet capture I could see the outgoing responses to packets coming in on the non-default WAN interface but they were tried to be sent to the internet with the private RFC1918 IP address of the responding server. So, incoming, the port forwarding is working as expected, the server answers correctly, but outgoing, NAT is not applied. But at least, the responses went out on the correct non-default WAN interface. My first assumption was that the responses might be going out on the default WAN interface instead.

Actions #12

Updated by Jim Pingle 3 months ago

2.6.0 snapshots are currently working correctly, and the fix was checked into RELENG_2_5_0. Whatever release happens next will behave correctly either way (e.g. a 2.6.0 release or a 2.5.x point or patch release).

Actions #13

Updated by Jens Groh 3 months ago

Jim Pingle wrote:

2.6.0 snapshots are currently working correctly, and the fix was checked into RELENG_2_5_0. Whatever release happens next will behave correctly either way (e.g. a 2.6.0 release or a 2.5.x point or patch release).

If you don't mind: if the fix was checked into RELENG_2_5_0, could you post the fix/patch ID so one could cherry pick it via system patches and test it? There are many users and clients waiting for their multiWAN to run cleanly again and feedback/helping catch flaws would surely be better if more could already test the proposed fix!

Thanks,
Jens

Actions #14

Updated by Jim Pingle 3 months ago

Jens Groh wrote:

If you don't mind: if the fix was checked into RELENG_2_5_0, could you post the fix/patch ID so one could cherry pick it via system patches and test it? There are many users and clients waiting for their multiWAN to run cleanly again and feedback/helping catch flaws would surely be better if more could already test the proposed fix!

It is a kernel-level fix, not something that can be applied as a patch using that package.

Actions #15

Updated by Rafael Possamai 3 months ago

It is a kernel-level fix, not something that can be applied as a patch using that package.

Jim, thanks for the updates. Do you have a link for the upstream bug report, or was this introduced by a Netgate patch? Thanks.

Actions #16

Updated by Jim Pingle 3 months ago

  • Plus Target Version set to 21.05
Actions #17

Updated by Jim Pingle 3 months ago

  • Plus Target Version deleted (21.05)

Actually this was fixed in the previous Plus release so not relevant to Plus. Taking back off.

Actions #18

Updated by Tom Davis 2 months ago

Hi, just want to report its working fine now for me using the latest dev CE version 2.6.0.a.20210524.0100
More details: Running in Hyper-V, Gateway group Load balancing with 3 Tier 1 Openvpn Gateways.
For me, 2.5.0-dev broke the Gateway Group. 2.5.1 broke Port forward and fixed Gateway Groups, 2.6.0.a fixed them both.
Regards,
Thanks for all the great work!
-TD

Actions #19

Updated by Vikash Jhagroe 2 months ago

Tom Davis wrote:

Hi, just want to report its working fine now for me using the latest dev CE version 2.6.0.a.20210524.0100
More details: Running in Hyper-V, Gateway group Load balancing with 3 Tier 1 Openvpn Gateways.
For me, 2.5.0-dev broke the Gateway Group. 2.5.1 broke Port forward and fixed Gateway Groups, 2.6.0.a fixed them both.
Regards,
Thanks for all the great work!
-TD

Thnx for your feedback, we already know this bug is fixed in the upcoming 2.6. I hope you are seriously not running a DEVELOPMENT RELEASE in a production environment. Running a DEVELOPMENT release is not the fix for this bug in a PRODUCTION environment.

Actions #20

Updated by Jim Pingle 2 months ago

  • Target version changed from 2.6.0 to 2.5.2
Actions #21

Updated by Jim Pingle 2 months ago

Testing on 2.5.2-BETA snapshot build 2.5.2.b.20210601.0300 confirms it is fixed there on a system which could reproduce the problem on 2.5.1.

Will hold open for now to wait for additional feedback, but can be closed if none is received before release.

Actions #22

Updated by Jim Pingle 2 months ago

  • Subject changed from Port forward works only on interface with default gateway, does not work for alternative wans (CE Only) to Port forward rules only function through the default gateway interface, ``reply-to`` does not work for Multi-WAN (CE Only)

Updating subject for release notes.

Actions #23

Updated by Adam Kuklycz about 2 months ago

Question, does this affect virtual IP's that are setup on the same interface as the default gateway IP, or does the IP address have to be on a physically different interface/wan for it to become an issue?

Actions #24

Updated by Jim Pingle about 2 months ago

Adam Kuklycz wrote:

Question, does this affect virtual IP's that are setup on the same interface as the default gateway IP, or does the IP address have to be on a physically different interface/wan for it to become an issue?

As far as I'm aware it should not affect that scenario since the VIP is on the same interface as the default route. The problem scenario is when the return traffic must take a different path back to the original origin of the request. So long as the traffic comes in and out on the default route WAN that is OK.

Actions #25

Updated by Bouke Henstra about 2 months ago

Jim Pingle wrote:

Adam Kuklycz wrote:

Question, does this affect virtual IP's that are setup on the same interface as the default gateway IP, or does the IP address have to be on a physically different interface/wan for it to become an issue?

As far as I'm aware it should not affect that scenario since the VIP is on the same interface as the default route. The problem scenario is when the return traffic must take a different path back to the original origin of the request. So long as the traffic comes in and out on the default route WAN that is OK.

I did notice issues with a routed subnet via GRE. This traffic flows through the same WAN interface too. But that's technically something else than a virtual IP.

Actions #26

Updated by Renato Botelho about 2 months ago

  • Status changed from Feedback to Resolved

Bouke Henstra wrote:

Jim Pingle wrote:

Adam Kuklycz wrote:

Question, does this affect virtual IP's that are setup on the same interface as the default gateway IP, or does the IP address have to be on a physically different interface/wan for it to become an issue?

As far as I'm aware it should not affect that scenario since the VIP is on the same interface as the default route. The problem scenario is when the return traffic must take a different path back to the original origin of the request. So long as the traffic comes in and out on the default route WAN that is OK.

I did notice issues with a routed subnet via GRE. This traffic flows through the same WAN interface too. But that's technically something else than a virtual IP.

Please open a new bug with details about how to reproduce.

This specific issue is fixed.

Actions

Also available in: Atom PDF