Project

General

Profile

Actions

Bug #14556

open

Tailscale dropping routes from FIB

Added by Chris Linstruth 10 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Category:
Tailscale
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Affected Version:
Affected Plus Version:
Affected Architecture:

Description

Installation has several tailscale nodes. The problematic node is a 6100. Some of the other nodes are 2100s.

At some point in the past, it started malfunctioning on one of the nodes whenever specific types of changes are made.

  • Add or remove a node with routed subnets, all routes drop. Can successfully add/remove nodes without routes. This is on the tailscale machine config.
  • Simply marking a route as active or inactive (tailscale edit route settings) will also trigger it.

It occurs occasionally without any changes being made.
Bounce the tailscale process on that 6100 node and they return.
The routes just drop from the kernel FIB.
Only on the one node.

There is essentially nothing logged (DEBUG logging level) regarding the actions of the tailscale routing protocol. Nor is there anything of troubleshooting value on the tailscale cloud site.

All IPv4 tailscale routes drop including host routes. It is probably noteworthy that the IPv6 /48 is still in the table and tailscaled is still running.

Another possibly interesting note is the routes advertised by the 6100 that drops the routes remain advertised into the tailnet and present on the other nodes.

The nodes are still showing as “idle” so tailscale is still “up.”

Attempted to duplicate this by adding a tailnet to 4 pfSense nodes with routes and two devices without routes. It could not be made to misbehave.


Files

pf1_auto_ping_pf2_lan.PNG (51.7 KB) pf1_auto_ping_pf2_lan.PNG Matt Keys, 03/12/2024 10:26 PM
pf1_lan_ping_pf2_lan.PNG (43.7 KB) pf1_lan_ping_pf2_lan.PNG Matt Keys, 03/12/2024 10:26 PM
pf2_lan_ping_pf1_lan.PNG (43.2 KB) pf2_lan_ping_pf1_lan.PNG Matt Keys, 03/12/2024 10:28 PM
pf2_auto_ping_pf1_lan.PNG (51.2 KB) pf2_auto_ping_pf1_lan.PNG Matt Keys, 03/12/2024 10:28 PM
pf1_traceroute_any_pf2_lan.PNG (55 KB) pf1_traceroute_any_pf2_lan.PNG Matt Keys, 03/12/2024 11:07 PM
pf1_traceroute_lan_pf2_lan.PNG (56.9 KB) pf1_traceroute_lan_pf2_lan.PNG Matt Keys, 03/12/2024 11:07 PM
pf2_traceroute_any_pf1_lan.PNG (54.5 KB) pf2_traceroute_any_pf1_lan.PNG Matt Keys, 03/12/2024 11:11 PM
pf2_traceroute_lan_pf1_lan.PNG (56.9 KB) pf2_traceroute_lan_pf1_lan.PNG Matt Keys, 03/12/2024 11:11 PM
Actions #1

Updated by Chris Linstruth 3 months ago

Another user has a very similar issue.

Actions #4

Updated by Matt Keys about 2 months ago

Chris Linstruth wrote:

Attempted to duplicate this by adding a tailnet to 4 pfSense nodes with routes and two devices without routes. It could not be made to misbehave.

I noticed something peculiar this week regarding ts routes. The scenario is two pfsense CE 2.7.2 routers in a site-to-site configuration.

pf1 has lan (192.168.1.0/24), opt1 (192.168.2.0/24), opt2 (192.168.3.0/24), opt3 (192.168.4.0/24), opt5 (192.168.5.0/24). Tailscale advertised routes 192.168.[1-4].0/24.

pf2 has lan (192.168.6.0/24), opt1 (192.168.7.0/24), opt2 (192.168.8.0/24). Tailscale advertised routes 192.168.[6-8].0/24.

If you ping from pf1 diagnostics - ping to pf2 192.168.6.1 you'll get a successful reply, however if you ping from a host within pf1's lan, opt1, opt2, or opt3 you get no reply (connection timeout). This is also true if you select the lan interface within pf1 diagnostics - ping.

If you run a tailscale ping {pf2} from a host within pf1 lan running the tailscale client there is a successful pong response.

.. other side ..

If you ping from pf2 diagnostics - ping to pf1 to 192.168.[1-4].1 you get successful reply.

If you ping from a host within pf2 lan, opt1, opt2, or opt3, you get no reply (connection timeout). This is also true if you select the lan interface within pf2 diagnostics - ping.

If you run a traceroute or mtr from a host within lan (either pf1 or pf2) to 192.168.6.1 or 192.168.1.1 respectively, the response stops after the first hop of pf1 or pf2. This is also true if that host is running the tailscale client.

pf1 diagnostics - routes lists ..

192.168.6.0/24 link#11 US 18 1280 tailscale0
192.168.7.0/24 link#11 US 18 1280 tailscale0
192.168.8.0/24 link#11 US 18 1280 tailscale0

pf2 diagnostics - routes lists ..

192.168.1.0/24 link#9 US 14 1280 tailscale0
192.168.2.0/24 link#9 US 14 1280 tailscale0
192.168.3.0/24 link#9 US 14 1280 tailscale0
192.168.4.0/24 link#9 US 14 1280 tailscale0

https://tailscale.com/kb/1146/pfsense suggests I would need to enable uPnP or create a static NAT mapping, both of which I don't want to do. Perhaps this would work correctly if the networks are different classes (172.16) on one side or the other, or perhaps a bug in the 1.54.0 client (1.6.x is now stable)?

I should mention that before pf2 existed, that side had a different router in place with a different private address range. He was running the client directly on his desktop and able able to access everything on the pf1 lan, opt1, op2, opt3 private addresses without issue. This was discovered after installation of pf2 with tailscale routes on his side to replace that setup.

Actions

Also available in: Atom PDF