Project

General

Profile

Actions

Bug #6870

closed

Load balancer DNS (relayd) can't handle fragmented udp, breaks DNSSEC

Added by Harry Coin almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Load Balancer
Target version:
-
Start date:
10/21/2016
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
All
Affected Architecture:
All

Description

The built-in load balancer (relayd) has a protocol 'dns' that manages UDP dns queries. The purpose is to load balance name server requests to server pools -- and keep pfsense out of the internet facing resolving job.

When the UDP packet response size was less than the MTU, this worked well. However, bind / named DNSSEC replies typically have a UDP packet size that exceeds the MTU for one packet, so they are fragemented.

relayd does not know how to handle fragmented UDP packets, breaking the entire DNSSEC system only for DNS queries that have a response larger than 1472 bytes or so.

Yes, it did take too long to isolate this problem. I verified proper operation by natting DNS directly to a named machine, dnssec worked perfectly. Same dig dns +dnssec query through the load balancer failed. Same dig dns query without the dnssec worked normally through the load balancer.

Actions #1

Updated by Jim Pingle almost 5 years ago

Unlikely we can do much if anything for this, it's probably an issue in relayd itself and not the way we set it up. Your best bet is to reproduce it on a plain FreeBSD installation with relayd, if the same problem happens there -- which is likely -- it will need to be reported upstream to relayd.

Actions #2

Updated by Harry Coin almost 5 years ago

Update: dig and other dns query engines set the DF 'do not fragment' bit -- then go on to issue DNSSEC DNS queries leading to responses larger than one 1500 MTU packet --- meaning fragmentation.

Firewalls that drop fragmented packets with the 'do not fragment' bit set are following the protocol correctly --- but will fail to reply to broken DNS engines. Much as earlier versions of NFS did with the DF bit. So-- bug claim withdrawn. Hope this helps someone. I wonder if it's possible for relayd to 'selectively' allow fragmented packets with the DF bit set?

Actions #3

Updated by Harry Coin almost 5 years ago

To be clear:

The workaround for relayd / DNS protocol failing or being seemingly intermittent when load balancing DNS is:

Go to 'system / advanced / firewall & NAT'

Check the box 'clear invalid DF bits instead of dropping the packets'.

This will allow DNS responses larger than the typical 1500 octet frame size that must be fragmented to pass through pfsense nevertheless.

Actions #4

Updated by Harry Coin almost 5 years ago

Turns out causing pfsense to not drop fragmented 'do not fragment' packets creates more problems than it solves. For the benefit of others, I'll share the results of my research so far:

1) relayd is an OpenBSD product which was ported to PFsense two years ago and pretty much ignored since then, though the upstream relayd has been improved.

2) the inner workings of relayd's dns protocol delve into the guts of DNS, but don't comprehend EDNS -- which is named/bind's gizmo that does for UDP what PMTU does for TCP -- but just for bind/dns. So relayd still fails as a dns balancer if DNSSEC is important.

3) there is a really interesting load balancer in the freebsd ports package called 'pen'. It's very small, and has only one dependency. It installs easily in pfsense. It has one serious bug that prevents its use in my case: (Simplifying) When providing load balancing services on WAN to the subnet used by DNS servers, and pfsense has both an ip alias and a CARP virtual address on that subnet: on the backup machine pen will, somehow, transmit using the carp address on the backup machine, causing the response obviously to go to the primary machine which has a pen process that has no client expecting that reply in its table so drops it.

So, as matters stand: There is no way I've found for pfsense to act as a load balancer for DNS when the dns servers are more than just resolvers, but name servers for any zone and DNSSEC is enabled.

The only way out of the box is to run dedicated load balancing services on another OS instance. As maintaining load balancers is a task all by itself, there is administrative pressure to move all load balancing activity then out of pfsense.

Actions #5

Updated by Jim Pingle almost 5 years ago

  • Status changed from New to Closed

relayd is a part of the FreeBSD ports tree. It's not a piece of software that pfSense has ported or maintained. You can see its history on FreeBSD here: http://www.freshports.org/net/relayd/

Given the symptoms and the behavior you describe, it sounds like a limitation of relayd in general, though if it has been fixed on OpenBSD, then perhaps FreeBSD can sync its version with the OpenBSD version and we'll pick it up naturally once that has happened.

If you want to pursue this further, you should attempt to reproduce the problem on FreeBSD and perhaps OpenBSD.

Actions #6

Updated by Roland Kletzing over 4 years ago

notes from my findings:

1. relay can not do udp layer7 relaying besides "special case" dns

2. despite other information on the net (relayd would do udp + tcp when using for dns relay) it does not work with tcp for me when using dns config option

3. you can configure relayd to do layer7 relaying for port 53 for udp AND for tcp the following way:

@@
#https://de.slideshare.net/MenandMice/dns-highavailability-tools-opensource-load-balancing-solutions

table <dnsserver> { 192.168.1.80 }

dns protocol "dnsproto" {
tcp { nodelay, sack, socket buffer 1024, backlog 1000 }
}

relay dnsproxy-tcp {
listen on 192.168.1.52 port 53
forward to <dnsserver> port 53 check tcp
}

relay dnsproxy-udp {
listen on 192.168.1.52 port 53
protocol "dnsproto"
forward to <dnsserver> port 53 check tcp
}
@@

apparently, protocol "dnsproto" adds some "listen udp magic" here, i`m curious why we canĀ“t have "listen on .... proto udp|tcp port 53" here like possible with "redirect" configuration

if i stop pfsense loadbalancer and start relayd manually (relayd -vv -d -f /root/relayd7.conf), i can verify THIS config actually works.

proof as follows:

udp query:
dig @192.168.1.52 rs.dns-oarc.net txt -> works for me. you see dig switching to tcp because of large response

tcp query:
dig +tcp @192.168.1.52 rs.dns-oarc.net txt -> works for me

one problem remains:

pfsense don`t let you add virtual service of type "relay" (layer 7) as it defaults to type "redirect", which is layer3 which is not want you want, especially in this case where mixing layer3 and layer7 loadbalancing for the same port absolutely makes no sense.

i can currently workaround this issue with this change in pfsense installation:

@
diff -Naur vslb.inc vslb.inc.orig
--- vslb.inc 2017-03-08 13:05:36.778431000 +0100
+++ vslb.inc.orig 2017-03-08 13:05:23.523895000 +0100
@ -357,7 +357,7 @
}
$conf .= "}\n";
} else {
- $conf .= "relay \"{$name}\" {\n";
+ $conf .= "redirect \"{$name}\" {\n";
$conf .= " listen on {$ip} port {$src_port}\n";
$conf .= " forward to <{$vs_a[$i]['poolname']}> port {$dest_port} {$check_a[$pools[$vs_a[$i]['poolname']]['monitor']]} \n";
@

i would suggest to enhance pfsense load balancer configuration for providing more options for configuration or to override gui configuration by manual configuration.

i would like to use pfsense+relayd because we have a pfsense carp/ha setup and i don`t like to build another cluster just for making dns highly available

Actions

Also available in: Atom PDF