Project

General

Profile

Bug #6025

Load balancing fails when one gateway has a weight of 1 and another gateway has a weight >1

Added by Jim Pingle about 5 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
Multi-WAN
Target version:
Start date:
03/24/2016
Due date:
% Done:

100%

Estimated time:
Affected Version:
2.3
Affected Architecture:
Release Notes:
Default

Description

Strange one here:

Testing Load Balancing it seems there is an issue with weights. Given that there are two WANs each with one gateway, if one gateway has a weight of 1, and another gateway has a weight higher than 1 (2, 3, etc), only one of the gateways is used.

Example cases:
WAN=1, WAN2=1: OK

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

WAN1=1, WAN2=2: Only uses one gateway (203.0.113.1):

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx3 203.0.113.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

WAN1=2, WAN2=1: Only uses one gateway (203.0.113.1):

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

If both weights are > 1, then it is OK:
WAN1=2, WAN2=2: OK

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx3 203.0.113.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

WAN1=3, WAN2=2: OK

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx3 203.0.113.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

Also of note, when the weights differ, even though the gateways have a specific order with repetition in the rule, pf seems to still flip back and forth, though the general ratio of the weights is respected. For example with WAN1=3, WAN2=2:

$ for i in {1..10}; do curl http://iptest.example.com/ip.php; done
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3
Your IP Address is: 198.51.100.3
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3
Your IP Address is: 198.51.100.3
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3

If it turns out to be a deeper issue in pf, the easy fix would be to automatically multiply all weights by 2x if one is 1 and the other is >1.

Associated revisions

Revision 821be56a (diff)
Added by Viktor Gurov 6 months ago

Load balancing when one gateway has a weight of 1 and another gateway has a weight >1. Fixes #6025

History

#1 Updated by Jim Pingle about 5 years ago

  • Description updated (diff)

#2 Updated by Phillip Davis about 5 years ago

All the rules that you quote look OK, so are you saying that the code that generates the rules is OK, but somehow the implementation at run-time in pf is not happening effectively?

#3 Updated by Jim Pingle about 5 years ago

Phillip Davis wrote:

All the rules that you quote look OK, so are you saying that the code that generates the rules is OK, but somehow the implementation at run-time in pf is not happening effectively?

Correct, the rules look fine, but in practice it fails to work as expected at run time.

#4 Updated by Jim Pingle about 5 years ago

  • Target version changed from 2.3 to 2.3.1

Pushing this out a bit -- not a huge concern for now. If someone hits it they can easily work around it by adjusting weights in ways that were not possible before. Even the worst imbalance case on 2.2.x could only be 1:5, if someone wants that now they can set 2:10 and get the same effect.

#5 Updated by Jim Pingle about 5 years ago

  • Status changed from New to Assigned
  • Assignee set to Marc Dye

#6 Updated by Chris Buechler almost 5 years ago

  • Target version changed from 2.3.1 to 2.3.2

#7 Updated by Chris Buechler almost 5 years ago

  • Target version changed from 2.3.2 to 2.4.0

#8 Updated by Renato Botelho about 4 years ago

  • Assignee deleted (Marc Dye)

#9 Updated by Luiz Souza over 3 years ago

  • Target version changed from 2.4.0 to 2.4.1

#10 Updated by Jim Pingle over 3 years ago

  • Target version changed from 2.4.1 to 2.4.2

#11 Updated by Jim Pingle over 3 years ago

  • Target version changed from 2.4.2 to 2.4.3

#12 Updated by Jim Pingle about 3 years ago

  • Target version changed from 2.4.3 to 2.4.4

#13 Updated by Jim Pingle over 2 years ago

  • Target version changed from 2.4.4 to 48

#14 Updated by Jim Pingle about 2 years ago

  • Target version changed from 48 to 2.5.0

#15 Updated by Jim Pingle over 1 year ago

  • Category changed from Rules / NAT to Multi-WAN

#17 Updated by Jim Pingle 6 months ago

  • Status changed from Assigned to Pull Request Review

#18 Updated by Renato Botelho 6 months ago

  • Status changed from Pull Request Review to Feedback
  • Assignee set to Renato Botelho
  • % Done changed from 0 to 100

PR has been merged. Thanks!

#19 Updated by Fabian Schnelle 5 months ago

After this change, policy based routing no longer works.
The entry in the firewall rule is completely ignored and the default setting under "System / Routing / Gateways> Default gateway" is used.
It does not matter whether a gateway or a gateway group is used.
Current Base System: 2.5.0.a.20201111.1250

#20 Updated by Jim Pingle 5 months ago

Fabian Schnelle wrote:

After this change, policy based routing no longer works.
The entry in the firewall rule is completely ignored and the default setting under "System / Routing / Gateways> Default gateway" is used.
It does not matter whether a gateway or a gateway group is used.
Current Base System: 2.5.0.a.20201111.1250

I'm not seeing anything like that here. It's not likely to be related to this change since this only changed multipliers not which gateways are included in a group. There were other unrelated gateway around the same time, so you may be hitting something entirely different Post on the forum with a lot more detail about your setup and discuss/diagnose it there.

#21 Updated by Chris Linstruth 3 months ago

Verified that weights of 1 and 2 resulted in 2 and 4 entries in the rule set:

GWLOADBALANCE = "  route-to { ( vtnet1 172.25.228.1 ) ( vtnet1 172.25.228.1 ) ( vtnet4 172.25.227.1 ) ( vtnet4 172.25.227.1 ) ( vtnet4 172.25.227.1 ) ( vtnet4 172.25.227.1 )  }  round-robin  " 

Also unscientifically verified WAN used about half the bandwidth as WAN2 using speed tests and the traffic graph widget.

#22 Updated by Jim Pingle 3 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF