Bug #6423
closedWAN doesn't reconnect on dropped PPPoE session
0%
Description
Hi,
I've been troubleshooting this issue for about a month solidly now, and am certain it's a bug after swapping out everything else. Scenario is:
Dell Poweredge T320 with onboard dual Broadcom 5720 NICs, and dual Intel i350 NICs on a PCI card. Running VMware ESXi 6.0U2.
pfSense is the only VM on this server, and is given 2Gb RAM and 10 Gb disk and 1 vCPU. It has 1 x vNIC associated to "LAN" and 1 x vNIC associated to "WAN".
LAN is a vSwitch with nothing else on it local to the ESXi box but a few devices on the LAN (VoIP handsets, Ruckus AP, Wyse term, etc), and has a private IP address range of 192.168.31.0/24
WAN is a vSwitch with a single drop cable into a Draytek Vigor 130 ADSL/VDSL modem
ISP is ICUK, a medium size British ISP, and the line is an ADSL line with 5mbps/1mpbs provided by BT.
There is also an IPSec site to site link configured against another pfSense instance (2.2.6) at head office.
pfSense started out as being version 2.3, this exhibited the issue, then was updated to version 2.3_1, this also exhibited the issue, and the whole VM was then rebuilt with version 2.3.1 which ALSO exhibits the issue. I'm not using any packages other than OpenVMTools (which I didn't start out using until 2.3.1 and it's not affected the issue at all).
Essentially, the problem is that when started, pfSense successfully negotiates a PPPoE connection with the ISP's RADIUS servers. This connection works fine for a while, but within a day of the connection being live it will be brought down and will not renegotiate/redial to become live again.
Have tried:
- Changing modem
- Changing NICs from Broadcom to Intel
- Rebuilding new VM
- Changing cabling
- Adding a periodic reset (this was successful possibly 1 or 2 times in total but wholly unsuccessful otherwise)
- Changing the WAN to dial-on-demand mode
Literally almost every change I make to the WAN config, the line goes down and won't come back up, and the only successful way it appears that I can get the WAN up are as follows:
- Restart pfSense
- Unplug the ethernet cable from the modem, wait 10 seconds, plug the ethernet cable back into the modem
- Restart the modem (this only appears to work maybe 25% of the time)
As far as the logs go, I had a Reddit post here about the issue which includes the logs: https://www.reddit.com/r/PFSENSE/comments/4kn981/dpinger_behaviour/
Also, the ISP have confirmed that when the PPPoE is down the modem is in sync with the line, and can be for several days without pfSense redialling (or at least the redials getting through to the modem/ISP).