Project

General

Profile

Bug #6186

race conditions in service startup

Added by Chris Buechler over 1 year ago. Updated 4 months ago.

Status:
Confirmed
Priority:
Normal
Category:
-
Target version:
Start date:
04/17/2016
Due date:
% Done:

0%

Affected version:
All
Affected Architecture:

Description

There have always been a variety of possibilities for race conditions in service startup because of the nature of how multiple different things can call the functions that do the startup. That's a larger architectural issue which we're discussing options for properly addressing in the future.

The more immediate issue is after removing the "exit if booting" check from rc.newwanip(v6) in 2.3, which fixed a variety of edge case bugs with interfaces that are slow to come online during boot, some systems end up running certain things twice at almost exactly the same time. For instance, #6160, and probably #6132.

Adding locks in vpn_ipsec_configure was fine for strongswan in #6160. Might be fine in other areas, though adding locking like that can be risky in potentially breaking things that are fine now, if some of those functions end up recursing.

Associated revisions

Revision c4b5c8be
Added by Chris Buechler about 1 year ago

Setup gateway monitors and exit in rc.newwanip(v6) if system is booting. Ticket #6186

Revision d239edd1
Added by Chris Buechler about 1 year ago

Setup gateway monitors and exit in rc.newwanip(v6) if system is booting. Ticket #6186

Revision 6d4fd80b
Added by Chris Buechler about 1 year ago

Don't start unbound in track6 config if system is booting. Add dnsmasq here as well. Based on PR 2943. Ticket #6186

Revision b460c43b
Added by Chris Buechler about 1 year ago

Don't start unbound in track6 config if system is booting. Add dnsmasq here as well. Based on PR 2943. Ticket #6186

History

#1 Updated by Chris Buechler over 1 year ago

  • Description updated (diff)

#2 Updated by sebastian nielsen about 1 year ago

Same applies for services that start up in the wrong order. So if a VPN client interface is slow to start up, and a unbound DNS forwarder has the VPN client interface as outgoing, unbound will sometimes start before the VPN client interface has came up, causing the unbound server to permanently return "SERVFAIL" as it reports a configuration error since said interface didn't exist at start.

Same applies to routing.

I think the whole architecture needs recoding such as it will first always bring up the interfaces including starting services related to interfaces such as VPN clients/servers, (blocking operation), then start any services not related to interfaces (also a blocking operation), and then apply any firewall rules, custom routes, default gateway and NAT.

#3 Updated by Jim Pingle about 1 year ago

VPN and DNS is not that clear a solution. You have a chicken-and-egg scenario there. In plenty of cases you need working DNS before the VPN can be brought up, especially if you are using a hostname for the VPN peer. In that case you'd have to start DNS, then the VPN, then restart DNS if it doesn't (re)attach to the VPN interface.

#4 Updated by sebastian nielsen about 1 year ago

yes that would be a good idea. Forced restart on unbound after VPN success. However, it can be a good idea to then delay the restart of unbound a few seconds after VPN success to ensure the interface has "settled" before attempting to attach DNS server to it.

#5 Updated by Chris Buechler about 1 year ago

  • Target version changed from 2.3.1 to 2.3.2

What I committed takes things back to 2.2.x and earlier behavior, plus retaining the fix for #5952. That's confirmed to fix/avoid the "pf wedged" issues, things like unbound and dhcpd starting twice at almost exactly the same time (though those don't hurt anything, ugly log spam), among other things.

Virtually all the race conditions people have encountered are from that change in rc.newwanip during boot. So we're at 2.2.6 and a bit better now for 2.3.1.

But there's a larger architectural issue to be addressed. This is a hack to avoid these kinds of issues.

#6 Updated by Chris Buechler about 1 year ago

  • Target version changed from 2.3.2 to 2.4.0

#7 Updated by Renato Botelho 6 months ago

  • Assignee deleted (Marc Dye)

#8 Updated by Jim Thompson 5 months ago

  • Assignee set to Renato Botelho

#9 Updated by John Cairns 4 months ago

I've run into this issue as well on my pfSense machines that have ovpn client interfaces set as the outgoing interfaces for unbound. Although in my case, I don't see unbound fail to start, but rather unbound.conf reverts to its default of using all interfaces as outgoing interfaces. I don't know if this is helpful information or not, but I started a thread on the forums in which I include more detailed information:
[[https://forum.pfsense.org/index.php?topic=126925.0]]

Also, I understand the point about the DNS/VPN chicken and egg scenario, although I just use raw IPs for my VPN client connections. I feel that's a reasonable expectation, especially since most people using VPN client connections likely want all DNS traffic flowing through them as well. That said, I am by no means trying to evangelize here, just throwing in my two cents :)

Also available in: Atom PDF