Project

General

Profile

Bug #6658

DHCP Relay not working on 2.3.2

Added by Kill Bill 7 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Category:
Wireless
Target version:
Start date:
07/29/2016
Due date:
% Done:

100%

Affected version:
2.3.2
Affected Architecture:
amd64

Description

The DHCP Relay Service cannot be started on 2.3.2 with ath0, clients do not receive an IP address.

dhcrelay -i ath0_wlan0 -i re2 -i re0 -a -m replace 192.168.1.161
Internet Systems Consortium DHCP Relay Agent 4.3.4
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Can't attach interface ath0 to bpf device /dev/bpf0: Device not configured

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..

As soon as I disable that and set up a DHCP server on a 2.3.2+ box, the wifi gets working again.

[EDIT: Lots of possibly unrelated wifi chatter in ticket updates, updated description to reflect actual underlying problem.]

History

#1 Updated by martin wüthrich 6 months ago

I'm in the same Situation like described, except I have an "APU1" and my clients stay connected (they even authenticate with Radius), but due to the fact that the DHCP Relay Service can't be started, they do not receive an IP.
Beside the error that is already logged from "Kill Bill", I do receive the following error from the DHCP Relay (which is might be heavily related to the WiFi Card Issue):

dhcrelay -i ath0_wlan0 -i re2 -i re0 -a -m replace 192.168.1.161
Internet Systems Consortium DHCP Relay Agent 4.3.4
Copyright 2004-2016 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/
Can't attach interface ath0 to bpf device /dev/bpf0: Device not configured

If you think you have received this message due to a bug rather
than a configuration issue please read the section on submitting
bugs on either our web page at www.isc.org or in the README file
before submitting a bug.  These pages explain the proper
process and the information we find helpful for debugging..

#2 Updated by Jim Thompson 6 months ago

Could one or both of you try this on 2.4?

#3 Updated by Jim Thompson 6 months ago

  • Assignee set to Jim Thompson
  • Target version set to 2.4.0

#4 Updated by martin wüthrich 6 months ago

Hi Jim,

I have installed
https://snapshots.pfsense.org/amd64/pfSense_master/installer/pfSense-CE-memstick-serial-2.4.0-DEVELOPMENT-amd64-latest.img.gz (Date/Time 03-Sep-2016 01:10)

But unfortenately everything got worse :(
The Wireless card were not found, even within the Boot process was shown:


pcib4: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0
pci4: <ACPI PCI bus> on pcib4
ath0: <Atheros 9280> mem 0xf7e00000-0xf7e0ffff irq 19 at device 0.0 on pci4
[ath] enabling AN_TOP2_FIXUP
ath0: [HT] enabling HT modes
ath0: [HT] 1 stream STBC receive enabled
ath0: [HT] 1 stream STBC transmit enabled
ath0: [HT] 2 RX streams; 2 TX streams
ath0: AR9280 mac 128.2 RF5133 phy 13.0
ath0: 2GHz radio: 0x0000; 5GHz radio: 0x00c0


I was required to fall back to a previous verion, because the Routing had a big issue, which I could not solve.
I have now installed a pre Release of 2.3.2 and everything is working fine with the wireless.

#5 Updated by Kill Bill 6 months ago

I temporarily installed 2.4 alpha on a test box, and the wireless is completely broken there, the entire interface gone AWOL. So, that pretty much matches what Martin found. :-(

I'd appreciate a 2.3.x snapshot with whatever Atheros-related changes that went into 2.3.2 release reverted to pre-2.3.2 release state.

#6 Updated by Kill Bill 6 months ago

And FWIW - this does not appear to be limited to AR9280. I managed to rescue some oldie 802.11a/b/g mini-PCIe card with AR5424 chipset from a laptop, and it's the same story. HW info:

# dmesg | grep -i ath
ath0: <Atheros 5424/2424> mem 0xfe800000-0xfe80ffff at device 0.0 on pci4
ath0: AR2425 mac 14.2 RF5424 phy 7.0
ath0: 2GHz radio: 0x0000; 5GHz radio: 0x00a2
wlan0: changing name to 'ath0_wlan0'
wlan1: changing name to 'ath0_wlan1'
ath0: ath_reset_grablock: didn't finish after 10 iterations
ath0: ath_reset_grablock: warning, recursive reset path!
ath0: ath_chan_set: concurrent reset! Danger!
# ifconfig -v ath0_wlan0
ath0_wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
        ether 00:22:5f:5d:85:7b
        inet6 fe80::222:5fff:fe5d:857b%ath0_wlan0 prefixlen 64 scopeid 0x9
        inet 10.20.30.1 netmask 0xffffff00 broadcast 10.20.30.255
        inet6 2001:470:dead:beef::1 prefixlen 64
        nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL>
        media: IEEE 802.11 Wireless Ethernet autoselect mode 11g <hostap>
        status: running
        ssid MY_SSID channel 6 (2437 MHz 11g) bssid 00:22:5f:5d:85:7b
        regdomain ETSI country HU indoor ecm authmode WPA2/802.11i -wps -tsn
        privacy MIXED deftxkey 2
        AES-CCM 2:128-bit
        AES-CCM 3:128-bit powersavemode OFF powersavesleep 100 txpower 30
        txpowmax 50.0 -dotd rtsthreshold 2346 fragthreshold 2346 bmiss 7
        11a     ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        11b     ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        11g     ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        turboA  ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        turboG  ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        sturbo  ucast NONE    mgmt  6 Mb/s mcast  6 Mb/s maxretry 6
        11na    ucast NONE    mgmt 12 MCS  mcast 12 MCS  maxretry 6
        11ng    ucast NONE    mgmt  2 MCS  mcast  2 MCS  maxretry 6
        half    ucast NONE    mgmt  3 Mb/s mcast  3 Mb/s maxretry 6
        quarter ucast NONE    mgmt  1 Mb/s mcast  1 Mb/s maxretry 6
        scanvalid 60 -bgscan bgscanintvl 300 bgscanidle 250
        roam:11a     rssi    7dBm rate 12 Mb/s
        roam:11b     rssi    7dBm rate  1 Mb/s
        roam:11g     rssi    7dBm rate  5 Mb/s
        roam:turboA  rssi    7dBm rate 12 Mb/s
        roam:turboG  rssi    7dBm rate 12 Mb/s
        roam:sturbo  rssi    7dBm rate 12 Mb/s
        roam:11na    rssi    7dBm  MCS  1
        roam:11ng    rssi    7dBm  MCS  1
        roam:half    rssi    7dBm rate  6 Mb/s
        roam:quarter rssi    7dBm rate  3 Mb/s
        pureg protmode OFF -ht -htcompat -ampdu ampdulimit 64k ampdudensity 8
        -amsdu -shortgi htprotmode RTSCTS -puren smps -rifs wme burst -dwds
        -hidessid apbridge dtimperiod 1 doth -dfs inact bintval 100
        AC_BE cwmin  4 cwmax  6 aifs  3 txopLimit   0 -acm ack
              cwmin  4 cwmax 10 aifs  3 txopLimit   0 -acm
        AC_BK cwmin  4 cwmax 10 aifs  7 txopLimit   0 -acm ack
              cwmin  4 cwmax 10 aifs  7 txopLimit   0 -acm
        AC_VI cwmin  3 cwmax  4 aifs  1 txopLimit  94 -acm ack
              cwmin  3 cwmax  4 aifs  2 txopLimit  94 -acm
        AC_VO cwmin  2 cwmax  3 aifs  1 txopLimit  47 -acm ack
              cwmin  2 cwmax  3 aifs  2 txopLimit  47 -acm
        groups: wlan

OTOH, miniPCI ath cards appear to be working on antique HW like Alix 2D13; miniPCIe -> hopeless.

#7 Updated by Jim Pingle 6 months ago

Looks like on 11 you have to clone the interface. The wireless device (e.g. ath0) won't show in ifconfig.

Somehow we'll have to detect wireless devices like ath0 and offer them for creation/cloning as before.

You can make it show up temporarily by running:

ifconfig wlan0 create wlandev ath0

You can then assign that interface and use it, but unless you have an earlyshellcmd to bring it back it'll fail on the next boot.

The list of current wireless devices is in the net.wlan.devices sysctl OID so fetching them from there is easy, but we lose some of the extra info we had before pre-assignment, such as the MAC address of ath0.

#8 Updated by Jim Pingle 6 months ago

I'll make a fresh ticket for 2.4 with the above on it so it doesn't get lost here.

#9 Updated by Kill Bill 6 months ago

OK. After a lot of further testing and messing with various stuff, here is some mixed news:
- as for 2.3.x, the DHCP relay got completely screwed before 2.3.2 release. As soon as I disable that and set up a DHCP server on a 2.3.2+ box, the wifi gets working again. (Those HW/kernel related logs above are apparently misleading). Thanks to Martin for providing the hints.
- as for 2.4, apparently we have #6770 for that now.
- as for the ath driver issues with FreeBSD 10.3, no clue what's up. Sounds like the generic "FreeBSD sucks with wireless" issue.

#10 Updated by Kill Bill 6 months ago

And finally - the DHCP relay issues are so bad that it actually crashes pfSense when reconfiguring the service. I submitted a crash log earlier today, merely removing ath0 interface from the service configuration forced a crash and reboot of the box. :-( The logs are identical to what's mentioned in comment #1 here.
$

#11 Updated by Jim Pingle 6 months ago

OK so the real issue of this ticket is actually DHCP Relay breaking. Given the info in the description and such I'm thinking it might be better to close this out and start a fresh one specifically for DHCP relay, carrying over only the details and log entries relevant there. There were some changes for 2.3.2, see #6355, I'm not sure how easy it would be to test backing just those out since it included a patch for the DHCP relay daemon itself.

#12 Updated by Kill Bill 6 months ago

@jimp: Well if you can link a pre-6355 binary for download, I can test that for sure with multiple boxes. I might have some 2.3.2 prerelease images available but not exactly keen on digging into which one might still be working.

#13 Updated by Jim Pingle 6 months ago

I don't think we have any left, unless you count 2.3.1 which isn't so helpful in that area. If you do still have a 2.3.1 box around you could grab /usr/local/sbin/dhcrelay from it and copy it over to 2.3.2 as a test.

#14 Updated by Jim Pingle 6 months ago

  • Subject changed from ath (AR9280) wifi no longer usable in 2.3.2 to DHCP Relay not working on 2.3.2
  • Description updated (diff)
  • Assignee changed from Jim Thompson to Renato Botelho

#15 Updated by Jim Pingle 6 months ago

Rather than reinvent the wheel I updated the description on this ticket instead.

#16 Updated by Kill Bill 6 months ago

Thanks; managed to find the related crash dump I submitted today? (Should be either from 188.75.x.x or 2001:470:6e:xxxx::xxxx)

#17 Updated by Jim Pingle 6 months ago

Just found it (it was from the IPv6 address):

ath0: ath_reset_grablock: didn't finish after 10 iterations
ath0: ath_reset_grablock: warning, recursive reset path!
ath0: ath_chan_set: concurrent reset! Danger!
ath0: device timeout
ath0: stuck beacon; resetting (bmiss count 4)
<7>cannot forward src fe80:9::xxxx:xxxx:xxxx:xxxx, dst 2001:470:xxxx:xxxx:xxxx::1, nxt 17, rcvif ath0_wlan0, outif igb0
ath0: stuck beacon; resetting (bmiss count 4)
<7>cannot forward src fe80:9::xxxx:xxxx:xxxx:xxxx, dst 2001:470:xxxx:xxxx:xxxx::1, nxt 17, rcvif ath0_wlan0, outif igb0

Fatal trap 12: page fault while in kernel mode
cpuid = 2; apic id = 02
fault virtual address    = 0x27
fault code        = supervisor read data, page not present
instruction pointer    = 0x20:0xffffffff80d2c47b
stack pointer            = 0x28:0xfffffe012118b300
frame pointer            = 0x28:0xfffffe012118b370
code segment        = base 0x0, limit 0xfffff, type 0x1b
            = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags    = interrupt enabled, resume, IOPL = 0
current process        = 40074 (dhcrelay)

And the backtrace:

db:0:kdb.enter.default>  show pcpu
cpuid        = 2
dynamic pcpu = 0xfffffe0174c51500
curthread    = 0xfffff800872594b0: pid 40074 "dhcrelay" 
curpcb       = 0xfffffe012118bb80
fpcurthread  = none
idlethread   = 0xfffff800039804b0: tid 100005 "idle: cpu2" 
curpmap      = 0xfffff80003989838
tssp         = 0xffffffff82113560
commontssp   = 0xffffffff82113560
rsp0         = 0xfffffe012118bb80
gs32p        = 0xffffffff82114fb8
ldt          = 0xffffffff82114ff8
tss          = 0xffffffff82114fe8
db:0:kdb.enter.default>  bt
Tracing pid 40074 tid 100152 td 0xfffff800872594b0
mld_change_state() at mld_change_state+0x5b/frame 0xfffffe012118b370
in6_mc_leave() at in6_mc_leave+0x83/frame 0xfffffe012118b3b0
ip6_freemoptions() at ip6_freemoptions+0x10d/frame 0xfffffe012118b410
in_pcbfree() at in_pcbfree+0x18a/frame 0xfffffe012118b450
udp6_detach() at udp6_detach+0xe1/frame 0xfffffe012118b490
sofree() at sofree+0x171/frame 0xfffffe012118b4c0
soclose() at soclose+0x34f/frame 0xfffffe012118b500
_fdrop() at _fdrop+0x29/frame 0xfffffe012118b520
closef() at closef+0x21e/frame 0xfffffe012118b5b0
fdescfree() at fdescfree+0x4f9/frame 0xfffffe012118b660
exit1() at exit1+0x576/frame 0xfffffe012118b6f0
sigexit() at sigexit+0x925/frame 0xfffffe012118b9b0
postsig() at postsig+0x286/frame 0xfffffe012118ba70
ast() at ast+0x417/frame 0xfffffe012118bab0
doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffea50

The crash appears to be in IPv6 processing.

Does that particular configuration involve a bridge?

The "cannot forward" message reminds me of #5428 but the other symptoms don't line up.

#18 Updated by Kill Bill 6 months ago

Jim Pingle wrote:

Does that particular configuration involve a bridge?
The "cannot forward" message reminds me of #5428 but the other symptoms don't line up.

No bridges there at all. The box never ever crashed until I touched the dhcrelay stuff.

#19 Updated by Kill Bill 5 months ago

Target version: 2.4.0? Not exactly sure people are keen on waiting for a year to get something that was working to work again. How about reverting the thing to pre-2.3.2 state without that #6355 "fix" that broke everything and fixed nothing (at least according to https://forum.pfsense.org/index.php?topic=110901.0).

#20 Updated by Jim Pingle 5 months ago

Reverting that patch certainly does seem like a good idea given the responses.

Out of curiosity, have you tried this on a recent 2.3.3 snapshot? Or on 2.4?

#21 Updated by Jim Pingle 5 months ago

Also: Target for 2.4 is only a couple months out, not a year.

#23 Updated by Kill Bill 5 months ago

Jim Pingle wrote:

Out of curiosity, have you tried this on a recent 2.3.3 snapshot? Or on 2.4?

Yeah all the 2.3.3 snapshots are still broken. 2.4 is a complete no-go with wifi due to Bug #6770 so I really don't have any good place to test this.

#24 Updated by Kill Bill 4 months ago

Can this pretty please finally get the disastrous patch reverted? Not only it did not fix what it was supposed to fix (beyond the already mentioned https://forum.pfsense.org/index.php?topic=110901.0, there's another report https://forum.pfsense.org/index.php?topic=119798) but it broke the thing completely. I cannot see a single good thing about the patch. This is a completely no go in environments where you have lots of VLANs and all DHCP/DNS needs to be maintained under Active Directory.

#25 Updated by Renato Botelho 3 months ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100

Patch removed and package updated to 4.3.5 on pfSense 2.3.3 and 2.4.0

#26 Updated by Kill Bill 3 months ago

Yay!!! Will only be able to test after this weekend; going to post feedback here. Thanks.

#27 Updated by Kill Bill 3 months ago

Kill Bill wrote:

Yay!!! Will only be able to test after this weekend; going to post feedback here. Thanks.

Working again!!!

#28 Updated by Jim Pingle 3 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF