Project

General

Profile

Actions

Bug #6025

closed

Load balancing fails when one gateway has a weight of 1 and another gateway has a weight >1

Added by Jim Pingle over 5 years ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Category:
Multi-WAN
Target version:
Start date:
03/24/2016
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.3
Affected Architecture:

Description

Strange one here:

Testing Load Balancing it seems there is an issue with weights. Given that there are two WANs each with one gateway, if one gateway has a weight of 1, and another gateway has a weight higher than 1 (2, 3, etc), only one of the gateways is used.

Example cases:
WAN=1, WAN2=1: OK

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

WAN1=1, WAN2=2: Only uses one gateway (203.0.113.1):

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx3 203.0.113.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

WAN1=2, WAN2=1: Only uses one gateway (203.0.113.1):

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

If both weights are > 1, then it is OK:
WAN1=2, WAN2=2: OK

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx3 203.0.113.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

WAN1=3, WAN2=2: OK

pass in quick on vmx1 route-to { (vmx3 203.0.113.1), (vmx3 203.0.113.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1), (vmx0 198.51.100.1) } round-robin inet from 10.3.0.0/24 to any flags S/SA keep state label "USER_RULE: Default allow LAN to any rule"

Also of note, when the weights differ, even though the gateways have a specific order with repetition in the rule, pf seems to still flip back and forth, though the general ratio of the weights is respected. For example with WAN1=3, WAN2=2:

$ for i in {1..10}; do curl http://iptest.example.com/ip.php; done
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3
Your IP Address is: 198.51.100.3
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3
Your IP Address is: 198.51.100.3
Your IP Address is: 198.51.100.3
Your IP Address is: 203.0.113.3

If it turns out to be a deeper issue in pf, the easy fix would be to automatically multiply all weights by 2x if one is 1 and the other is >1.

Actions #1

Updated by Jim Pingle over 5 years ago

  • Description updated (diff)
Actions #2

Updated by Phillip Davis over 5 years ago

All the rules that you quote look OK, so are you saying that the code that generates the rules is OK, but somehow the implementation at run-time in pf is not happening effectively?

Actions #3

Updated by Jim Pingle over 5 years ago

Phillip Davis wrote:

All the rules that you quote look OK, so are you saying that the code that generates the rules is OK, but somehow the implementation at run-time in pf is not happening effectively?

Correct, the rules look fine, but in practice it fails to work as expected at run time.

Actions #4

Updated by Jim Pingle over 5 years ago

  • Target version changed from 2.3 to 2.3.1

Pushing this out a bit -- not a huge concern for now. If someone hits it they can easily work around it by adjusting weights in ways that were not possible before. Even the worst imbalance case on 2.2.x could only be 1:5, if someone wants that now they can set 2:10 and get the same effect.

Actions #5

Updated by Jim Pingle over 5 years ago

  • Status changed from New to Assigned
  • Assignee set to Marc Dye
Actions #6

Updated by Chris Buechler over 5 years ago

  • Target version changed from 2.3.1 to 2.3.2
Actions #7

Updated by Chris Buechler over 5 years ago

  • Target version changed from 2.3.2 to 2.4.0
Actions #8

Updated by Renato Botelho over 4 years ago

  • Assignee deleted (Marc Dye)
Actions #9

Updated by Luiz Souza about 4 years ago

  • Target version changed from 2.4.0 to 2.4.1
Actions #10

Updated by Jim Pingle about 4 years ago

  • Target version changed from 2.4.1 to 2.4.2
Actions #11

Updated by Jim Pingle almost 4 years ago

  • Target version changed from 2.4.2 to 2.4.3
Actions #12

Updated by Jim Pingle almost 4 years ago

  • Target version changed from 2.4.3 to 2.4.4
Actions #13

Updated by Jim Pingle about 3 years ago

  • Target version changed from 2.4.4 to 48
Actions #14

Updated by Jim Pingle over 2 years ago

  • Target version changed from 48 to 2.5.0
Actions #15

Updated by Jim Pingle about 2 years ago

  • Category changed from Rules / NAT to Multi-WAN
Actions #17

Updated by Jim Pingle about 1 year ago

  • Status changed from Assigned to Pull Request Review
Actions #18

Updated by Renato Botelho about 1 year ago

  • Status changed from Pull Request Review to Feedback
  • Assignee set to Renato Botelho
  • % Done changed from 0 to 100

PR has been merged. Thanks!

Actions #19

Updated by Fabian Schnelle 11 months ago

After this change, policy based routing no longer works.
The entry in the firewall rule is completely ignored and the default setting under "System / Routing / Gateways> Default gateway" is used.
It does not matter whether a gateway or a gateway group is used.
Current Base System: 2.5.0.a.20201111.1250

Actions #20

Updated by Jim Pingle 11 months ago

Fabian Schnelle wrote:

After this change, policy based routing no longer works.
The entry in the firewall rule is completely ignored and the default setting under "System / Routing / Gateways> Default gateway" is used.
It does not matter whether a gateway or a gateway group is used.
Current Base System: 2.5.0.a.20201111.1250

I'm not seeing anything like that here. It's not likely to be related to this change since this only changed multipliers not which gateways are included in a group. There were other unrelated gateway around the same time, so you may be hitting something entirely different Post on the forum with a lot more detail about your setup and discuss/diagnose it there.

Actions #21

Updated by Chris Linstruth 9 months ago

Verified that weights of 1 and 2 resulted in 2 and 4 entries in the rule set:

GWLOADBALANCE = "  route-to { ( vtnet1 172.25.228.1 ) ( vtnet1 172.25.228.1 ) ( vtnet4 172.25.227.1 ) ( vtnet4 172.25.227.1 ) ( vtnet4 172.25.227.1 ) ( vtnet4 172.25.227.1 )  }  round-robin  " 

Also unscientifically verified WAN used about half the bandwidth as WAN2 using speed tests and the traffic graph widget.

Actions #22

Updated by Jim Pingle 9 months ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF