Project

General

Profile

Actions

Bug #14556

open

Tailscale dropping routes from FIB

Added by Chris Linstruth over 1 year ago. Updated 4 months ago.

Status:
New
Priority:
Normal
Category:
Tailscale
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Affected Version:
Affected Plus Version:
Affected Architecture:

Description

Installation has several tailscale nodes. The problematic node is a 6100. Some of the other nodes are 2100s.

At some point in the past, it started malfunctioning on one of the nodes whenever specific types of changes are made.

  • Add or remove a node with routed subnets, all routes drop. Can successfully add/remove nodes without routes. This is on the tailscale machine config.
  • Simply marking a route as active or inactive (tailscale edit route settings) will also trigger it.

It occurs occasionally without any changes being made.
Bounce the tailscale process on that 6100 node and they return.
The routes just drop from the kernel FIB.
Only on the one node.

There is essentially nothing logged (DEBUG logging level) regarding the actions of the tailscale routing protocol. Nor is there anything of troubleshooting value on the tailscale cloud site.

All IPv4 tailscale routes drop including host routes. It is probably noteworthy that the IPv6 /48 is still in the table and tailscaled is still running.

Another possibly interesting note is the routes advertised by the 6100 that drops the routes remain advertised into the tailnet and present on the other nodes.

The nodes are still showing as “idle” so tailscale is still “up.”

Attempted to duplicate this by adding a tailnet to 4 pfSense nodes with routes and two devices without routes. It could not be made to misbehave.


Files

pf1_auto_ping_pf2_lan.PNG (51.7 KB) pf1_auto_ping_pf2_lan.PNG Matt Keys, 03/12/2024 10:26 PM
pf1_lan_ping_pf2_lan.PNG (43.7 KB) pf1_lan_ping_pf2_lan.PNG Matt Keys, 03/12/2024 10:26 PM
pf2_lan_ping_pf1_lan.PNG (43.2 KB) pf2_lan_ping_pf1_lan.PNG Matt Keys, 03/12/2024 10:28 PM
pf2_auto_ping_pf1_lan.PNG (51.2 KB) pf2_auto_ping_pf1_lan.PNG Matt Keys, 03/12/2024 10:28 PM
pf1_traceroute_any_pf2_lan.PNG (55 KB) pf1_traceroute_any_pf2_lan.PNG Matt Keys, 03/12/2024 11:07 PM
pf1_traceroute_lan_pf2_lan.PNG (56.9 KB) pf1_traceroute_lan_pf2_lan.PNG Matt Keys, 03/12/2024 11:07 PM
pf2_traceroute_any_pf1_lan.PNG (54.5 KB) pf2_traceroute_any_pf1_lan.PNG Matt Keys, 03/12/2024 11:11 PM
pf2_traceroute_lan_pf1_lan.PNG (56.9 KB) pf2_traceroute_lan_pf1_lan.PNG Matt Keys, 03/12/2024 11:11 PM
Actions #1

Updated by Chris Linstruth 11 months ago

Another user has a very similar issue.

Actions #4

Updated by Matt Keys 10 months ago

Chris Linstruth wrote:

Attempted to duplicate this by adding a tailnet to 4 pfSense nodes with routes and two devices without routes. It could not be made to misbehave.

I noticed something peculiar this week regarding ts routes. The scenario is two pfsense CE 2.7.2 routers in a site-to-site configuration.

pf1 has lan (192.168.1.0/24), opt1 (192.168.2.0/24), opt2 (192.168.3.0/24), opt3 (192.168.4.0/24), opt5 (192.168.5.0/24). Tailscale advertised routes 192.168.[1-4].0/24.

pf2 has lan (192.168.6.0/24), opt1 (192.168.7.0/24), opt2 (192.168.8.0/24). Tailscale advertised routes 192.168.[6-8].0/24.

If you ping from pf1 diagnostics - ping to pf2 192.168.6.1 you'll get a successful reply, however if you ping from a host within pf1's lan, opt1, opt2, or opt3 you get no reply (connection timeout). This is also true if you select the lan interface within pf1 diagnostics - ping.

If you run a tailscale ping {pf2} from a host within pf1 lan running the tailscale client there is a successful pong response.

.. other side ..

If you ping from pf2 diagnostics - ping to pf1 to 192.168.[1-4].1 you get successful reply.

If you ping from a host within pf2 lan, opt1, opt2, or opt3, you get no reply (connection timeout). This is also true if you select the lan interface within pf2 diagnostics - ping.

If you run a traceroute or mtr from a host within lan (either pf1 or pf2) to 192.168.6.1 or 192.168.1.1 respectively, the response stops after the first hop of pf1 or pf2. This is also true if that host is running the tailscale client.

pf1 diagnostics - routes lists ..

192.168.6.0/24 link#11 US 18 1280 tailscale0
192.168.7.0/24 link#11 US 18 1280 tailscale0
192.168.8.0/24 link#11 US 18 1280 tailscale0

pf2 diagnostics - routes lists ..

192.168.1.0/24 link#9 US 14 1280 tailscale0
192.168.2.0/24 link#9 US 14 1280 tailscale0
192.168.3.0/24 link#9 US 14 1280 tailscale0
192.168.4.0/24 link#9 US 14 1280 tailscale0

https://tailscale.com/kb/1146/pfsense suggests I would need to enable uPnP or create a static NAT mapping, both of which I don't want to do. Perhaps this would work correctly if the networks are different classes (172.16) on one side or the other, or perhaps a bug in the 1.54.0 client (1.6.x is now stable)?

I should mention that before pf2 existed, that side had a different router in place with a different private address range. He was running the client directly on his desktop and able able to access everything on the pf1 lan, opt1, op2, opt3 private addresses without issue. This was discovered after installation of pf2 with tailscale routes on his side to replace that setup.

Actions #8

Updated by Matt Keys 7 months ago

Update 5/26, regarding the ping from pf1 to pf2 (or vice versa), I notice this only gets a successful reply when using the IPv6 loopback interface (or auto). Any of the IPv4 interfaces return connection timeout. Additionally in Diagnostics - Routes tailscale seems to bind IPv6 routes when only IPv4 is in use. All pf interfaces are set to 'none' in the ipv6 areas, and configured to drop all IPv6 in the filter.

Additionally if you try to create a firewall rule on any interface and specify tailscale as the source address, pfsense will save the rule but error in attempting to apply it. The error specifies the tailscale group is unknown.

Actions #9

Updated by Matt Keys 4 months ago

Close this ticket please. Fix action for the site-to-site subnet routing issue on CE 2.7.x below. This is described in Christian McDonnald's youtube video (https://www.youtube.com/watch?v=Fg_jIPVcioY) towards the end. You'd miss it if you only watched the subnet routing config portion, and around 18:26 he implies that's all you really need for communication between. I recommend updating documentation with these steps:

192.168.1.0/24 side ..

1. Status - tailscale, copy your local tailscale IP to the clipboard.
2. Firewall - alias, create an alias for the tailscale IP called 'tailscalevip'
3. Firewall - NAT - Outbound, create a new outbound rule

source: any
protocol: any
address family: ipv4
destination: 192.168.6.0/24
translation address: network/alias 'tailscalevip'

On the 192.168.6.0/24 side ..

1. Status - tailscale, copy your local tailscale IP to the clipboard.
2. Firewall - alias, create an alias for the tailscale IP called 'tailscalevip'
3. Firewall - NAT - Outbound, create a new outbound rule

source: any
protocol: any
address family: ipv4
destination: 192.168.1.0/24
translation address: network/alias 'tailscalevip'

Then test ..

Ping from a host running in 192.168.6.0/24 without the tailscale client to a host on the 192.168.1.0/24 side. You should get successful reply.

Ping from a host running in 192.168.1.0/24 without the tailscale client to a host on the 192.168.6.0/24 side. You should get a successful reply.

Actions

Also available in: Atom PDF