Bug #6658
closedDHCP Relay not working on 2.3.2
100%
Description
The DHCP Relay Service cannot be started on 2.3.2 with ath0, clients do not receive an IP address.
dhcrelay -i ath0_wlan0 -i re2 -i re0 -a -m replace 192.168.1.161 Internet Systems Consortium DHCP Relay Agent 4.3.4 Copyright 2004-2016 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Can't attach interface ath0 to bpf device /dev/bpf0: Device not configured If you think you have received this message due to a bug rather than a configuration issue please read the section on submitting bugs on either our web page at www.isc.org or in the README file before submitting a bug. These pages explain the proper process and the information we find helpful for debugging..
As soon as I disable that and set up a DHCP server on a 2.3.2+ box, the wifi gets working again.
[EDIT: Lots of possibly unrelated wifi chatter in ticket updates, updated description to reflect actual underlying problem.]
Updated by martin wüthrich about 8 years ago
I'm in the same Situation like described, except I have an "APU1" and my clients stay connected (they even authenticate with Radius), but due to the fact that the DHCP Relay Service can't be started, they do not receive an IP.
Beside the error that is already logged from "Kill Bill", I do receive the following error from the DHCP Relay (which is might be heavily related to the WiFi Card Issue):
dhcrelay -i ath0_wlan0 -i re2 -i re0 -a -m replace 192.168.1.161 Internet Systems Consortium DHCP Relay Agent 4.3.4 Copyright 2004-2016 Internet Systems Consortium. All rights reserved. For info, please visit https://www.isc.org/software/dhcp/ Can't attach interface ath0 to bpf device /dev/bpf0: Device not configured If you think you have received this message due to a bug rather than a configuration issue please read the section on submitting bugs on either our web page at www.isc.org or in the README file before submitting a bug. These pages explain the proper process and the information we find helpful for debugging..
Updated by Jim Thompson about 8 years ago
Could one or both of you try this on 2.4?
Updated by Jim Thompson about 8 years ago
- Assignee set to Jim Thompson
- Target version set to 2.4.0
Updated by martin wüthrich about 8 years ago
Hi Jim,
I have installed
https://snapshots.pfsense.org/amd64/pfSense_master/installer/pfSense-CE-memstick-serial-2.4.0-DEVELOPMENT-amd64-latest.img.gz (Date/Time 03-Sep-2016 01:10)
But unfortenately everything got worse :(
The Wireless card were not found, even within the Boot process was shown:
pcib4: <ACPI PCI-PCI bridge> irq 19 at device 7.0 on pci0 pci4: <ACPI PCI bus> on pcib4 ath0: <Atheros 9280> mem 0xf7e00000-0xf7e0ffff irq 19 at device 0.0 on pci4 [ath] enabling AN_TOP2_FIXUP ath0: [HT] enabling HT modes ath0: [HT] 1 stream STBC receive enabled ath0: [HT] 1 stream STBC transmit enabled ath0: [HT] 2 RX streams; 2 TX streams ath0: AR9280 mac 128.2 RF5133 phy 13.0 ath0: 2GHz radio: 0x0000; 5GHz radio: 0x00c0
I was required to fall back to a previous verion, because the Routing had a big issue, which I could not solve.
I have now installed a pre Release of 2.3.2 and everything is working fine with the wireless.
Updated by Kill Bill about 8 years ago
I temporarily installed 2.4 alpha on a test box, and the wireless is completely broken there, the entire interface gone AWOL. So, that pretty much matches what Martin found. :-(
I'd appreciate a 2.3.x snapshot with whatever Atheros-related changes that went into 2.3.2 release reverted to pre-2.3.2 release state.
Updated by Kill Bill about 8 years ago
And FWIW - this does not appear to be limited to AR9280. I managed to rescue some oldie 802.11a/b/g mini-PCIe card with AR5424 chipset from a laptop, and it's the same story. HW info:
# dmesg | grep -i ath ath0: <Atheros 5424/2424> mem 0xfe800000-0xfe80ffff at device 0.0 on pci4 ath0: AR2425 mac 14.2 RF5424 phy 7.0 ath0: 2GHz radio: 0x0000; 5GHz radio: 0x00a2 wlan0: changing name to 'ath0_wlan0' wlan1: changing name to 'ath0_wlan1' ath0: ath_reset_grablock: didn't finish after 10 iterations ath0: ath_reset_grablock: warning, recursive reset path! ath0: ath_chan_set: concurrent reset! Danger!
# ifconfig -v ath0_wlan0 ath0_wlan0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500 ether 00:22:5f:5d:85:7b inet6 fe80::222:5fff:fe5d:857b%ath0_wlan0 prefixlen 64 scopeid 0x9 inet 10.20.30.1 netmask 0xffffff00 broadcast 10.20.30.255 inet6 2001:470:dead:beef::1 prefixlen 64 nd6 options=21<PERFORMNUD,AUTO_LINKLOCAL> media: IEEE 802.11 Wireless Ethernet autoselect mode 11g <hostap> status: running ssid MY_SSID channel 6 (2437 MHz 11g) bssid 00:22:5f:5d:85:7b regdomain ETSI country HU indoor ecm authmode WPA2/802.11i -wps -tsn privacy MIXED deftxkey 2 AES-CCM 2:128-bit AES-CCM 3:128-bit powersavemode OFF powersavesleep 100 txpower 30 txpowmax 50.0 -dotd rtsthreshold 2346 fragthreshold 2346 bmiss 7 11a ucast NONE mgmt 6 Mb/s mcast 6 Mb/s maxretry 6 11b ucast NONE mgmt 1 Mb/s mcast 1 Mb/s maxretry 6 11g ucast NONE mgmt 1 Mb/s mcast 1 Mb/s maxretry 6 turboA ucast NONE mgmt 6 Mb/s mcast 6 Mb/s maxretry 6 turboG ucast NONE mgmt 1 Mb/s mcast 1 Mb/s maxretry 6 sturbo ucast NONE mgmt 6 Mb/s mcast 6 Mb/s maxretry 6 11na ucast NONE mgmt 12 MCS mcast 12 MCS maxretry 6 11ng ucast NONE mgmt 2 MCS mcast 2 MCS maxretry 6 half ucast NONE mgmt 3 Mb/s mcast 3 Mb/s maxretry 6 quarter ucast NONE mgmt 1 Mb/s mcast 1 Mb/s maxretry 6 scanvalid 60 -bgscan bgscanintvl 300 bgscanidle 250 roam:11a rssi 7dBm rate 12 Mb/s roam:11b rssi 7dBm rate 1 Mb/s roam:11g rssi 7dBm rate 5 Mb/s roam:turboA rssi 7dBm rate 12 Mb/s roam:turboG rssi 7dBm rate 12 Mb/s roam:sturbo rssi 7dBm rate 12 Mb/s roam:11na rssi 7dBm MCS 1 roam:11ng rssi 7dBm MCS 1 roam:half rssi 7dBm rate 6 Mb/s roam:quarter rssi 7dBm rate 3 Mb/s pureg protmode OFF -ht -htcompat -ampdu ampdulimit 64k ampdudensity 8 -amsdu -shortgi htprotmode RTSCTS -puren smps -rifs wme burst -dwds -hidessid apbridge dtimperiod 1 doth -dfs inact bintval 100 AC_BE cwmin 4 cwmax 6 aifs 3 txopLimit 0 -acm ack cwmin 4 cwmax 10 aifs 3 txopLimit 0 -acm AC_BK cwmin 4 cwmax 10 aifs 7 txopLimit 0 -acm ack cwmin 4 cwmax 10 aifs 7 txopLimit 0 -acm AC_VI cwmin 3 cwmax 4 aifs 1 txopLimit 94 -acm ack cwmin 3 cwmax 4 aifs 2 txopLimit 94 -acm AC_VO cwmin 2 cwmax 3 aifs 1 txopLimit 47 -acm ack cwmin 2 cwmax 3 aifs 2 txopLimit 47 -acm groups: wlan
OTOH, miniPCI ath cards appear to be working on antique HW like Alix 2D13; miniPCIe -> hopeless.
Updated by Jim Pingle about 8 years ago
Looks like on 11 you have to clone the interface. The wireless device (e.g. ath0) won't show in ifconfig.
Somehow we'll have to detect wireless devices like ath0 and offer them for creation/cloning as before.
You can make it show up temporarily by running:
ifconfig wlan0 create wlandev ath0
You can then assign that interface and use it, but unless you have an earlyshellcmd to bring it back it'll fail on the next boot.
The list of current wireless devices is in the net.wlan.devices sysctl OID so fetching them from there is easy, but we lose some of the extra info we had before pre-assignment, such as the MAC address of ath0.
Updated by Jim Pingle about 8 years ago
I'll make a fresh ticket for 2.4 with the above on it so it doesn't get lost here.
Updated by Kill Bill about 8 years ago
OK. After a lot of further testing and messing with various stuff, here is some mixed news:
- as for 2.3.x, the DHCP relay got completely screwed before 2.3.2 release. As soon as I disable that and set up a DHCP server on a 2.3.2+ box, the wifi gets working again. (Those HW/kernel related logs above are apparently misleading). Thanks to Martin for providing the hints.
- as for 2.4, apparently we have #6770 for that now.
- as for the ath driver issues with FreeBSD 10.3, no clue what's up. Sounds like the generic "FreeBSD sucks with wireless" issue.
Updated by Kill Bill about 8 years ago
And finally - the DHCP relay issues are so bad that it actually crashes pfSense when reconfiguring the service. I submitted a crash log earlier today, merely removing ath0 interface from the service configuration forced a crash and reboot of the box. :-( The logs are identical to what's mentioned in comment #1 here.
$
Updated by Jim Pingle about 8 years ago
OK so the real issue of this ticket is actually DHCP Relay breaking. Given the info in the description and such I'm thinking it might be better to close this out and start a fresh one specifically for DHCP relay, carrying over only the details and log entries relevant there. There were some changes for 2.3.2, see #6355, I'm not sure how easy it would be to test backing just those out since it included a patch for the DHCP relay daemon itself.
Updated by Kill Bill about 8 years ago
Jim Pingle: Well if you can link a pre-6355 binary for download, I can test that for sure with multiple boxes. I might have some 2.3.2 prerelease images available but not exactly keen on digging into which one might still be working.
Updated by Jim Pingle about 8 years ago
I don't think we have any left, unless you count 2.3.1 which isn't so helpful in that area. If you do still have a 2.3.1 box around you could grab /usr/local/sbin/dhcrelay from it and copy it over to 2.3.2 as a test.
Updated by Jim Pingle about 8 years ago
- Subject changed from ath (AR9280) wifi no longer usable in 2.3.2 to DHCP Relay not working on 2.3.2
- Description updated (diff)
- Assignee changed from Jim Thompson to Renato Botelho
Updated by Jim Pingle about 8 years ago
Rather than reinvent the wheel I updated the description on this ticket instead.
Updated by Kill Bill about 8 years ago
Thanks; managed to find the related crash dump I submitted today? (Should be either from 188.75.x.x or 2001:470:6e:xxxx::xxxx)
Updated by Jim Pingle about 8 years ago
Just found it (it was from the IPv6 address):
ath0: ath_reset_grablock: didn't finish after 10 iterations ath0: ath_reset_grablock: warning, recursive reset path! ath0: ath_chan_set: concurrent reset! Danger! ath0: device timeout ath0: stuck beacon; resetting (bmiss count 4) <7>cannot forward src fe80:9::xxxx:xxxx:xxxx:xxxx, dst 2001:470:xxxx:xxxx:xxxx::1, nxt 17, rcvif ath0_wlan0, outif igb0 ath0: stuck beacon; resetting (bmiss count 4) <7>cannot forward src fe80:9::xxxx:xxxx:xxxx:xxxx, dst 2001:470:xxxx:xxxx:xxxx::1, nxt 17, rcvif ath0_wlan0, outif igb0 Fatal trap 12: page fault while in kernel mode cpuid = 2; apic id = 02 fault virtual address = 0x27 fault code = supervisor read data, page not present instruction pointer = 0x20:0xffffffff80d2c47b stack pointer = 0x28:0xfffffe012118b300 frame pointer = 0x28:0xfffffe012118b370 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 40074 (dhcrelay)
And the backtrace:
db:0:kdb.enter.default> show pcpu cpuid = 2 dynamic pcpu = 0xfffffe0174c51500 curthread = 0xfffff800872594b0: pid 40074 "dhcrelay" curpcb = 0xfffffe012118bb80 fpcurthread = none idlethread = 0xfffff800039804b0: tid 100005 "idle: cpu2" curpmap = 0xfffff80003989838 tssp = 0xffffffff82113560 commontssp = 0xffffffff82113560 rsp0 = 0xfffffe012118bb80 gs32p = 0xffffffff82114fb8 ldt = 0xffffffff82114ff8 tss = 0xffffffff82114fe8 db:0:kdb.enter.default> bt Tracing pid 40074 tid 100152 td 0xfffff800872594b0 mld_change_state() at mld_change_state+0x5b/frame 0xfffffe012118b370 in6_mc_leave() at in6_mc_leave+0x83/frame 0xfffffe012118b3b0 ip6_freemoptions() at ip6_freemoptions+0x10d/frame 0xfffffe012118b410 in_pcbfree() at in_pcbfree+0x18a/frame 0xfffffe012118b450 udp6_detach() at udp6_detach+0xe1/frame 0xfffffe012118b490 sofree() at sofree+0x171/frame 0xfffffe012118b4c0 soclose() at soclose+0x34f/frame 0xfffffe012118b500 _fdrop() at _fdrop+0x29/frame 0xfffffe012118b520 closef() at closef+0x21e/frame 0xfffffe012118b5b0 fdescfree() at fdescfree+0x4f9/frame 0xfffffe012118b660 exit1() at exit1+0x576/frame 0xfffffe012118b6f0 sigexit() at sigexit+0x925/frame 0xfffffe012118b9b0 postsig() at postsig+0x286/frame 0xfffffe012118ba70 ast() at ast+0x417/frame 0xfffffe012118bab0 doreti_ast() at doreti_ast+0x1f/frame 0x7fffffffea50
The crash appears to be in IPv6 processing.
Does that particular configuration involve a bridge?
The "cannot forward" message reminds me of #5428 but the other symptoms don't line up.
Updated by Kill Bill about 8 years ago
Jim Pingle wrote:
Does that particular configuration involve a bridge?
The "cannot forward" message reminds me of #5428 but the other symptoms don't line up.
No bridges there at all. The box never ever crashed until I touched the dhcrelay stuff.
Updated by Kill Bill about 8 years ago
Target version: 2.4.0? Not exactly sure people are keen on waiting for a year to get something that was working to work again. How about reverting the thing to pre-2.3.2 state without that #6355 "fix" that broke everything and fixed nothing (at least according to https://forum.pfsense.org/index.php?topic=110901.0).
Updated by Jim Pingle about 8 years ago
Reverting that patch certainly does seem like a good idea given the responses.
Out of curiosity, have you tried this on a recent 2.3.3 snapshot? Or on 2.4?
Updated by Jim Pingle about 8 years ago
Also: Target for 2.4 is only a couple months out, not a year.
Updated by martin wüthrich about 8 years ago
Updated by Kill Bill about 8 years ago
Jim Pingle wrote:
Out of curiosity, have you tried this on a recent 2.3.3 snapshot? Or on 2.4?
Yeah all the 2.3.3 snapshots are still broken. 2.4 is a complete no-go with wifi due to Bug #6770 so I really don't have any good place to test this.
Updated by Kill Bill about 8 years ago
Can this pretty please finally get the disastrous patch reverted? Not only it did not fix what it was supposed to fix (beyond the already mentioned https://forum.pfsense.org/index.php?topic=110901.0, there's another report https://forum.pfsense.org/index.php?topic=119798) but it broke the thing completely. I cannot see a single good thing about the patch. This is a completely no go in environments where you have lots of VLANs and all DHCP/DNS needs to be maintained under Active Directory.
Updated by Renato Botelho about 8 years ago
- Status changed from New to Feedback
- % Done changed from 0 to 100
Patch removed and package updated to 4.3.5 on pfSense 2.3.3 and 2.4.0
Updated by Kill Bill about 8 years ago
Yay!!! Will only be able to test after this weekend; going to post feedback here. Thanks.
Updated by Kill Bill about 8 years ago
Kill Bill wrote:
Yay!!! Will only be able to test after this weekend; going to post feedback here. Thanks.
Working again!!!
Updated by Jim Pingle almost 8 years ago
- Status changed from Feedback to Resolved