Project

General

Profile

Bug #3069

traceroute6 fails to timeout and hangs the webconfigurator GUI

Added by Doktor Notor about 7 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Category:
Operating System
Target version:
Start date:
07/03/2013
Due date:
% Done:

0%

Estimated time:
Affected Version:
All
Affected Architecture:

Description

As simple as trying to run IPv6 traceroute to www.google.com from the GUI:

2 gige-g2-20.core1.prg1.he.net 3.317 ms 11.719 ms 2.712 ms
3 nixcz-v6.net.google.com 45.940 ms 10.852 ms 10.970 ms
4 2001:4860::1:0:4ca2 45.398 ms 11.265 ms 11.076 ms
5 2001:4860::8:0:5039 11.545 ms 11.095 ms 10.917 ms
6 2001:4860::8:0:3097 25.850 ms
2001:4860::8:0:3098 26.843 ms 26.517 ms
7 2001:4860::2:0:612 24.934 ms
2001:4860::2:0:6e0 25.945 ms 30.676 ms

Same thing from shell:

2 gige-g2-20.core1.prg1.he.net 2.725 ms 2.681 ms 11.847 ms
3 nixcz-v6.net.google.com 10.874 ms 21.850 ms 11.070 ms
4 2001:4860::1:0:4ca2 11.037 ms 10.983 ms 10.820 ms
5 2001:4860::8:0:5039 10.994 ms 18.642 ms 28.842 ms
6 2001:4860::8:0:3098 24.807 ms 24.396 ms 24.448 ms
7 2001:4860::2:0:612 26.017 ms
2001:4860::2:0:6e0 31.798 ms 25.013 ms

Hangs indefinitely.

Compare to traceroute run from Windows box:

@ 3 4 ms 2 ms 2 ms gige-g2-20.core1.prg1.he.net [2001:470:0:221::1]

4    11 ms    10 ms    11 ms  nixcz-v6.net.google.com [2001:7f8:14::1d:1]
5 11 ms 11 ms 10 ms 2001:4860::1:0:4ca2
6 11 ms 11 ms 11 ms 2001:4860::8:0:5038
7 26 ms 26 ms 26 ms 2001:4860::8:0:3098
8 30 ms 29 ms 28 ms 2001:4860::2:0:6e0
9 * * * Request timed out.
10 26 ms 26 ms 26 ms bk-in-x93.1e100.net [2a00:1450:4008:c01::93]@

This renders the web GUI completely unresponsive and unusable until you kill the traceroute6 process via console.

History

#1 Updated by Jim Pingle about 7 years ago

  • Status changed from New to Feedback

I can't reproduce this on current 2.1 code.

In the GUI we pass "-w 2" which waits a max of two seconds for a reply from the target server for each trace attempt. For me, from the GUI to www.google.com, it hits hop 8 and times out three times then hits the last hop, so there is a ~6 second pause but it does proceed.

From the shell if you don't pass it -w X, then it does hang indefinitely waiting for a reply.

Edit the source of /usr/local/www/diag_traceroute.php, uncomment the line that echos the traceroute command when executed, and then try it again and paste the output here. And try it from the shell with -w 1 or -w 2.

#2 Updated by Doktor Notor about 7 years ago

Not really required to uncomment anything there. It's endlessly visible in the process listing from console, till you kill it manually.


62161 ?? S 0:00.03 /usr/sbin/traceroute6 -w 2 -m 18 www.google.com

And it's the same story with -w X from commandline of course, timeout never ever happens.

#3 Updated by Doktor Notor about 7 years ago

BTW, installed mtr-nox11, no such issue:

HOST: gw.example.com Loss% Snt Last Avg Best Wrst StDev
...
2.|-- gige-g2-20.core1.prg1.he. 0.0% 10 3.1 6.3 2.8 11.3 3.8
3.|-- nixcz-v6.net.google.com 0.0% 10 11.7 12.0 11.1 17.8 2.1
4.|-- 2001:4860::1:0:4ca2 0.0% 10 10.9 12.8 10.9 20.6 3.7
5.|-- 2001:4860::8:0:5039 0.0% 10 11.0 12.4 11.0 21.8 3.3
6.|-- 2001:4860::8:0:3097 0.0% 10 26.0 26.4 25.9 28.3 0.8
7.|-- 2001:4860::2:0:6e0 0.0% 10 25.9 26.5 25.9 28.6 0.9
8.|-- ??? 100.0 10 0.0 0.0 0.0 0.0 0.0
9.|-- bk-in-x69.1e100.net 0.0% 10 25.9 26.0 25.7 26.4 0.2

#4 Updated by Jim Pingle about 7 years ago

  • Status changed from Feedback to New

I was able to reproduce it finally. I tried it on a few different pfSense boxes and FreeBSD systems, and I only could reproduce it on i386 on both FreeBSD and pfSense.

Amd64 didn't seem to have any problems in the same situations. My other test systems either never had a timeout/skipped entry, or they timed out as expected (* * *) and kept moving.

Not sure what we can effectively do to counter that.

#5 Updated by Doktor Notor about 7 years ago

Hmmm well, not sure either, beyond either a shiny red warning (think about remotely managed boxes, cutting yourself off the GUI kinda sucks :-P) or maybe the mtr utility would be a good replacement (it'd need some polishing, the current mtr GUI does not offer IPv6 dropdown).

#6 Updated by Jim Pingle about 7 years ago

MTR is an entirely different type of test. Useful, but probably not one we'd include by default. And yes its GUI does need a bit of polish.

It is likely that it's a bug in FreeBSD's traceroute6. I didn't any have i386 systems on FreeBSD 9 or 10 that also had a path which included a timeout. If the bug is gone in a current traceroute6, we may be able to patch in a fix.

#7 Updated by Doktor Notor about 7 years ago

Looks like the code was last touched (beyond irrelevant cosmetics) almost 4 years ago. Unlikely to have any fix.

#8 Updated by Doktor Notor about 7 years ago

FWIW, tried with truss /usr/sbin/traceroute6 -w 2 -m 18 www.google.com - it looks like it does actually make it thru to the last hop, however, there it gets stuck completely in a stupid loop.

sendto(4,"\^V\b\0\0Q\M-W\^V\M-_\0\r9\M-j",12,0x0,{ AF_INET6 [2a00:1450:4008:c01::69]:33456 },0x1c) = 12 (0xc)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050592.741271 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 136 (0x88)
gettimeofday({1373050592.858897 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050593.752360 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050594.761495 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050595.771728 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050596.781864 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050597.791833 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050598.802189 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050599.812327 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050600.822777 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050601.832704 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050602.843511 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 136 (0x88)
gettimeofday({1373050603.140628 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050603.853568 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050604.863216 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050605.873673 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050606.883456 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050607.843754 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050607.893793 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050608.904070 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050609.914040 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 136 (0x88)
gettimeofday({1373050610.481887 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050610.924084 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 136 (0x88)
gettimeofday({1373050611.912224 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050611.934322 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050612.944385 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050613.955584 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050614.965010 },0x0)           = 0 (0x0)
^Vpoll({3/POLLIN},1,2000)                                = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
gettimeofday({1373050615.975144 },0x0)           = 0 (0x0)
poll({3/POLLIN},1,2000)                          = 1 (0x1)
recvmsg(0x3,0x804ebfc,0x0,0xdf16d751,0xea390d00,0x51d716df) = 24 (0x18)
...

until aborted via CTRL+C. Afraid cannot help in any way with this.

#9 Updated by Chris Buechler almost 6 years ago

  • Category set to Operating System
  • Status changed from New to Confirmed

it's pf that makes this hang somehow. disable pf, and traceroute6 finishes no problem. No blocked traffic being logged.

#10 Updated by Chris Buechler over 5 years ago

  • Status changed from Confirmed to Feedback

this doesn't seem to be an issue in 2.2.x

#11 Updated by Kill Bill over 5 years ago

Well, it still hangs here exactly the same as ever. I tried pfctl -d before running this and it did not help in any way either.

#12 Updated by Chris Buechler over 4 years ago

  • Status changed from Feedback to Confirmed
  • Target version set to 2.3.2
  • Affected Version changed from 2.1-IPv6 to All

Denny Page tracked down the source of this issue and opened this FreeBSD PR with a patch.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=210286

#13 Updated by Renato Botelho about 4 years ago

  • Assignee set to Renato Botelho

I'll make some tests and import the patch to our tree

#14 Updated by Renato Botelho about 4 years ago

  • Status changed from Confirmed to Feedback

Imported traceroute6 patch to FreeBSD-src repo. It'll be available on next round of snapshots

#15 Updated by Chris Buechler about 4 years ago

  • Status changed from Feedback to Resolved

works

Also available in: Atom PDF