Project

General

Profile

Bug #8056

Bridge + CARP crashes/freezes pfSense

Added by Anonymous about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
CARP
Target version:
Start date:
11/05/2017
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.4.x
Affected Architecture:
All

Description

Same behavior as the linked bug below: running CARP on a bridge interface and sending any non-trivial amount of traffic to the CARP IP results in freezing pfSense.

Older issue: https://redmine.pfsense.org/issues/4607

On a VirtualBox VM the VM just freezes, whereas on real hardware the hardware did not completely freeze (e.g. the serial console was somewhat usable), but various processes ended up in locked state, as per the symptoms drescribed in https://forum.pfsense.org/index.php?topic=139030.0

The problem is mitigated when traffic is sent to the pfSense interface IP instead of the CARP IP.

History

#1 Updated by Anonymous about 2 years ago

More context: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200319

This configuration works well on 2.3.3+ (didn't test any previous releases), but fails on 2.4.1.

#2 Updated by Anonymous about 2 years ago

Re-tested a few days ago on 2.4.2 and I can observe the same crash.

Can anyone move this report to status Confirmed, since several people have reported the same issue in the linked forum thread?

#3 Updated by Harry Coin about 2 years ago

Confirmed. For detail, see this.
https://redmine.pfsense.org/issues/8145

#4 Updated by Harry Coin about 2 years ago

PF deadlocks once every 3 hours or so. There's a process holding a lock (carp lock, bridge lock)? which then I think fires off an ifconfig which in part wants to display carp status and there it sits. Interestingly VGA/keyboard ops are locked, but it is still possible to run any non-network related thread via the serial connection. There's lots of detail on the above referenced report.

#5 Updated by Harry Coin almost 2 years ago

This is observed on pfsense running in a QEMU/KVM host running Ubuntu/"artful".

#6 Updated by Harry Coin almost 2 years ago

Happens on both e1000 drivers and virtio drivers.

#7 Updated by James Freeman almost 2 years ago

Confirmed - I can also replicate this easily. CARP on a bridged interface, tested on 2.4.2 and 2.4.2_1 with no change. pfSense running on VMware ESXi 6.5 and VMware Workstation 14, e1000 emulated NIC's, fully repeatable on both platforms. Happy to help with any testing required on this.

#8 Updated by Adam Boyhan almost 2 years ago

Confirmed - We have 2 Netgate 8860 1u appliances setup with CARP + Bridge and when upgrading from 2.3.4 to 2.4.2_1 we hit this bug on both firewall's. Sometimes we would be ok for 10-15 minutes, other times we would make it past a hour of uptime. Ended up having to go back to 2.3.4 which is simply rock solid.

#9 Updated by Jim Pingle almost 2 years ago

  • Category set to CARP
  • Priority changed from High to Normal
  • Affected Version changed from 2.4.1 to 2.4.x
  • Affected Architecture set to All

The underlying FreeBSD bug is still open:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200319

The previous patch that was on 2.2.x and 2.3.x had some issues and was not accepted by FreeBSD:
https://reviews.freebsd.org/D3133

#10 Updated by Luiz Souza almost 2 years ago

  • Status changed from New to Confirmed
  • Assignee set to Luiz Souza

#11 Updated by Anonymous almost 2 years ago

The previous patch works well on 2.3.x. Is it possible to apply the same patch for 2.4.x while FreeBSD folks decide what to do next?

#12 Updated by Luiz Souza almost 2 years ago

  • Priority changed from Normal to High
  • Affected Version changed from 2.4.x to 2.4.3

#13 Updated by Luiz Souza almost 2 years ago

  • Target version set to 2.4.3
  • Affected Version changed from 2.4.3 to 2.4.x

Set target.

#14 Updated by Scott Maxwell almost 2 years ago

I have exactly the same issue with my pfSense setup on a Netgate Physical Appliance. Is there any ETA when this will be resolved ?

#15 Updated by Andreas Kaindl almost 2 years ago

I also have exactly the same issue on netgate appliances 8860. I first thought it is a hardware problem and migrated the config to a pair of SG4860. after 10 mins the same problem again

#16 Updated by Simon Kristensen almost 2 years ago

I just upgrade my pfsense from 2.3.4-p1 to 2.4.2-Release-p1.
Now I also have the same issue.

Any news on this, Luiz :-) ?

Thanks

#17 Updated by Simon Kristensen almost 2 years ago

Simon Kristensen wrote:

I just upgrade my pfsense from 2.3.4-p1 to 2.4.2-Release-p1.
Now I also have the same issue.

Any news on this, Luiz :-) ?

Thanks

Back to 2.3.5-p1 and it works again.

#18 Updated by Luiz Souza almost 2 years ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100

This issue seems to be fixed (again) in my local tests.

Please check with tomorrow's snapshot.

#19 Updated by Steve Wheeler over 1 year ago

I have tested this. I could easily trigger it in 2.4.2_1 but could not in current snaps. It looks to be solved.

Anyone who was hitting this and is able to please test current 2.4.3 snapshots.

#20 Updated by Jim Pingle over 1 year ago

  • Status changed from Feedback to Resolved

Tested and resolved.

Also available in: Atom PDF