Project

General

Profile

Actions

Bug #1235

closed

pfsense 2.0 load balancing with a https monitor seems to default timeout 200ms causing constant timeouts

Added by Gary Richards almost 14 years ago. Updated over 13 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Load Balancer
Target version:
Start date:
01/27/2011
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.0
Affected Architecture:

Description

Hi,

I've been failing with load balancing between two https web servers on pfsense 2.0.

I have two https servers in a pool, port 443, with a custom monitor. The custom monitor is configured like so:
Type: HTTPS
Path: /some/url
Host: some.host.tld
HTTP Code: 200

Once I bring my webservers up, I see requests from the pfsense boxes and I see both my web servers responding with 200's... all good.

However, my load balancers are mostly marked as down, occassionally they will become active for a few seconds, but then they go down again.

A snippet from the logs suggest:

Jan 27 14:39:43     relayd[40896]: host 10.105.255.22, check http code use ssl (180ms), state down -> up, availability 0.43%
Jan 27 14:39:43     relayd[40618]: pfe_dispatch_imsg: state 1 for host 4 10.105.255.22
Jan 27 14:39:43     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:39:43     relayd[40896]: host 10.105.255.21, check http code use ssl (210ms), state up -> down, availability 0.46%
Jan 27 14:39:43     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:39:43     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:39:43     relayd[40618]: pfe_dispatch_imsg: state -1 for host 3 10.105.255.21
Jan 27 14:39:53     relayd[40618]: table TIM_LB2_443: 1 added, 1 deleted, 0 changed, 0 killed
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (http code ok)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:39:53     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (http code ok)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (http code ok)
Jan 27 14:40:03     relayd[40896]: host 10.105.255.21, check http code use ssl (186ms), state down -> up, availability 0.42%
Jan 27 14:40:03     relayd[40618]: pfe_dispatch_imsg: state 1 for host 1 10.105.255.21
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:03     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:40:13     relayd[40618]: table TIM_LB1_443: 1 added, 0 deleted, 0 changed, 0 killed
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:13     relayd[40618]: pfe_sync: enabling ruleset
Jan 27 14:40:13     relayd[40618]: sync_ruleset: rule added to rdr-anchor "relayd/TIM_LB1_443" 
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (tcp connect ok)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (tcp connect ok)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:13     relayd[40896]: host 10.105.255.21, check http code use ssl (205ms), state up -> down, availability 0.42%
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.22 (ssl read timeout)
Jan 27 14:40:13     relayd[40896]: host 10.105.255.22, check http code use ssl (205ms), state up -> down, availability 0.44%
Jan 27 14:40:13     relayd[40896]: hce_notify_done: 10.105.255.21 (ssl read timeout)
Jan 27 14:40:13     relayd[40618]: pfe_dispatch_imsg: state -1 for host 1 10.105.255.21
Jan 27 14:40:13     relayd[40618]: pfe_dispatch_imsg: state -1 for host 4 10.105.255.22


Ok... this almost looks like less than 200ms response relayd takes the server up, greater than 200ms it takes it down again.

Further investigation suggests that relayd checks default to 200ms.... which is slightly cunning, because if I look at the /var/etc/relayd.conf file:

log updates 
table <WEB_443> { 10.105.255.21, 10.105.255.22 }
redirect "LB1_443" {
  listen on 10.105.255.23 port 443
  forward to <WEB_443> port 443 check https '/some/url' host some.host.tld code 200 timeout 1000
}

Which suggest that a timeout of 1000ms is being set... unfortunately it doesn't seem to have the desired effect?! At least, it dosen't seem to be having the desired effect as my servers keep dropping out of the pool after > 200ms response

For what it's worth a second load balancer that simply does tcp check on a port, that returns in less than 200ms every single time works perfectly.

Actions

Also available in: Atom PDF