Bug #1235: pfsense 2.0 load balancing with a https monitor seems to default timeout 200ms causing constant timeouts - pfSense - pfSense bugtracker

Actions

Copy link

Bug #1235

closed

pfsense 2.0 load balancing with a https monitor seems to default timeout 200ms causing constant timeouts

Added by Gary Richards over 15 years ago. Updated about 15 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

Load Balancer

Target version:

2.0

Start date:

01/27/2011

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Affected Version:

2.0

Affected Architecture:

Description

Hi,

I've been failing with load balancing between two https web servers on pfsense 2.0.

I have two https servers in a pool, port 443, with a custom monitor. The custom monitor is configured like so:
Type: HTTPS
Path: /some/url
Host: some.host.tld
HTTP Code: 200

Once I bring my webservers up, I see requests from the pfsense boxes and I see both my web servers responding with 200's... all good.

However, my load balancers are mostly marked as down, occassionally they will become active for a few seconds, but then they go down again.

A snippet from the logs suggest:

Jan 27 14:39:43     relayd[40896]: host 10.105.255.22, check http code use ssl (180ms), state down -> up, availability 0.43%
Jan 27 14:39:43     relayd[40618]: pfe_dispatch_imsg: state 1 for host 4 10.105.255.22
Jan 27 14:39:43     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:39:43     relayd[40896]: host 10.105.255.21, check http code use ssl (210ms), state up -> down, availability 0.46%
Jan 27 14:39:43     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:39:43     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:39:43     relayd[40618]: pfe_dispatch_imsg: state -1 for host 3 10.105.255.21
Jan 27 14:39:53     relayd[40618]: table TIM_LB2_443: 1 added, 1 deleted, 0 changed, 0 killed
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (http code ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (http code ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (http code ok)
Jan 27 14:40:03     relayd[40896]: host 10.105.255.21, check http code use ssl (186ms), state down -> up, availability 0.42%
Jan 27 14:40:03     relayd[40618]: pfe_dispatch_imsg: state 1 for host 1 10.105.255.21
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:40:13     relayd[40618]: table TIM_LB1_443: 1 added, 0 deleted, 0 changed, 0 killed
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:13     relayd[40618]: pfe_sync: enabling ruleset
Jan 27 14:40:13     relayd[40618]: sync_ruleset: rule added to rdr-anchor "relayd/TIM_LB1_443" 
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:13     relayd[40896]: host 10.105.255.21, check http code use ssl (205ms), state up -> down, availability 0.42%
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:40:13     relayd[40896]: host 10.105.255.22, check http code use ssl (205ms), state up -> down, availability 0.44%
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:13     relayd[40618]: pfe_dispatch_imsg: state -1 for host 1 10.105.255.21
Jan 27 14:40:13     relayd[40618]: pfe_dispatch_imsg: state -1 for host 4 10.105.255.22

Ok... this almost looks like less than 200ms response relayd takes the server up, greater than 200ms it takes it down again.

Further investigation suggests that relayd checks default to 200ms.... which is slightly cunning, because if I look at the /var/etc/relayd.conf file:

log updates 
table <WEB_443> { 10.105.255.21, 10.105.255.22 }
redirect "LB1_443" {
  listen on 10.105.255.23 port 443
  forward to <WEB_443> port 443 check https '/some/url' host some.host.tld code 200 timeout 1000
}

Which suggest that a timeout of 1000ms is being set... unfortunately it doesn't seem to have the desired effect?! At least, it dosen't seem to be having the desired effect as my servers keep dropping out of the pool after > 200ms response

For what it's worth a second load balancer that simply does tcp check on a port, that returns in less than 200ms every single time works perfectly.

Actions

Copy link

Updated by John Welter about 15 years ago

I have confirmed this issue. The default relayd timeout of 200ms is being used even with a timeout specified inside the redirect stanza. Adding timeout 1000 right after the 'log updates' (the global section of the config) works to increase the timeout for all load balance pools in the configuration.

Actions

Copy link

Updated by John Welter about 15 years ago

Did some quick tests: The timeout option is only valid as a global config or in the table stanza, not in a redirect or relay. If it is not in the global section or in a table stanza it is ignored, silently.

The following works:

log updates
table <WEB_443> { 10.105.255.21, 10.105.255.22 timeout 1000 }
redirect "LB1_443" {
listen on 10.105.255.23 port 443
forward to <WEB_443> port 443 check https '/some/url' host some.host.tld code 200 timeout 1000
}

The following does not:

log updates
table <WEB_443> { 10.105.255.21, 10.105.255.22 }
redirect "LB1_443" {
listen on 10.105.255.23 port 443
forward to <WEB_443> port 443 check https '/some/url' host some.host.tld code 200 timeout 1000
}

Hope this helps. When a patch is available I can easily test it.

Actions

Copy link