Project

General

Profile

Actions

Bug #1943

closed

PPPoE won't reconnect after link loss when using vr(4) NICs on certain ISPs only

Added by David Burgess over 12 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
PPP Interfaces
Target version:
-
Start date:
10/10/2011
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.0
Affected Architecture:

Description

(Copied from http://forum.pfsense.org/index.php/topic,40671.msg209656.html#msg209656)

Attached are four files. Here's how I got them:

1. Boot system fresh. PPPoE connected.
2.clog /var/log/system.log | egrep '(mpd|ppp)' > ppp_working.txt
cp /var/etc/mpd_wan.conf mpd_wan_working.conf.txt

3. Pull plug on WAN NIC. Wait 2 minutes. Re-plug. (WAN not connecting)
4.clog /var/log/system.log | egrep '(mpd|ppp)' > ppp_not_working.txt
cp /var/etc/mpd_wan.conf mpd_wan_not_working.conf.txt

5. Reboot to working WAN.

Null Service Name is checked.


Files

ppp_working.txt (33.6 KB) ppp_working.txt David Burgess, 10/10/2011 06:26 AM
ppp_not_working.txt (97.8 KB) ppp_not_working.txt David Burgess, 10/10/2011 06:26 AM
mpd_wan_working.conf.txt (2.63 KB) mpd_wan_working.conf.txt David Burgess, 10/10/2011 06:26 AM
mpd_wan_not_working.conf.txt (2.63 KB) mpd_wan_not_working.conf.txt David Burgess, 10/10/2011 06:26 AM
reconnect_log.txt (65.4 KB) reconnect_log.txt copied from ppp log in UI, newest on top David Burgess, 01/29/2013 07:39 AM
ppp.log (2.83 KB) ppp.log Dmitriy K, 11/09/2014 03:08 PM
Actions #1

Updated by David Burgess about 12 years ago

I commented out the line:

/usr/sbin/ngctl shutdown $1

from the file

/usr/local/sbin/ppp-linkdown

as recommended by jimp in the linked forum thread and this fixed the problem. Updating pfsense to 2.0.1 overwrote my changes however and so the problem returned. I don't know what this line is supposed to, but from my little centre of the universe it appears it should just be removed from future releases.

Actions #2

Updated by Anonymous over 11 years ago

The provided fix above works. Please implement this fix.

Actions #3

Updated by Chris Buechler over 11 years ago

  • Target version set to 2.1
Actions #4

Updated by Bipin Chandra over 11 years ago

+1

works in link loss situations and i think it should be implemented rather having to apply patch after every update

Actions #5

Updated by Andrew Stuart over 11 years ago

I too can confirm this works for a family member running on 2.0.1. In their case, if they never established a connection for example, the modem wasn't powered on when pfSense booted, it will never attempt to connect.

Actions #6

Updated by Renato Botelho over 11 years ago

Is it still happening on recent snapshots? I couldn't reproduce it, every time I re-plugged the wan cabble, pppoe reconnected fine.

Actions #7

Updated by Renato Botelho over 11 years ago

  • Assignee set to Renato Botelho
Actions #8

Updated by Renato Botelho over 11 years ago

  • Status changed from New to Feedback
Actions #9

Updated by Bipin Chandra over 11 years ago

this is a very old issue, a big long thread on this since ages http://forum.pfsense.org/index.php/topic,41061.0.html

Actions #10

Updated by Bipin Chandra over 11 years ago

the issue is some driver related and Unisphere BRAS which more and more isp r using which uses some flood protection or some security so many ppl suffering this and the worst thing is every other cheap router in the market reconnects fine except pfsense, i personally tried this in 2 different countries with different isp

there r 3 situations:
- unplug wan cable and if connected within link loss trigger timeout then connection continues
- unplug wan cable and wait till complete link loss, approx 2mins and then reconnect cable and a new connection is negotiated but it wont reconnect
- reset isp modem and when its up pfsense needs to renegotiate and reconnect but falls, results in reconnect loop and never comes up until pfsense is rebooted

Actions #11

Updated by Dim Hatz about 11 years ago

Bipin, if you've identified that Unisphere BRAS is used by all the ISPs you've tried and had problems with, then perhaps you should also discuss your finding with the developers of mpd at http://mpd.sourceforge.net/

Actions #12

Updated by Bipin Chandra about 11 years ago

i did post at the freebsd forum and also somewhere on mpd long time back but hardly any1 replied, few suggestions were posted regarding mpd config change which dont help at all

Actions #13

Updated by Anonymous about 11 years ago

I tested with the following build and the issue is not present for me any more:

2.1-BETA1 (amd64)
built on Thu Jan 24 07:40:32 EST 2013
FreeBSD 8.3-RELEASE-p5

Actions #14

Updated by David Burgess about 11 years ago

2.1-BETA1 (amd64)
built on Mon Jan 21 16:42:50 EST 2013
FreeBSD 8.3-RELEASE-p5

After unplugging then replugging the WAN interface (actually, this pfsense has only one physical NIC, with multiple LANs and WANs via vlans), the pppoe connection comes active again, but some problems remain:

  1. The most recent entries in the PPP log (webUI) appear to indicate that the link is still down, even though it is up (attached).
  2. My LAN interface has a rule that defines a failover gateway on a non-ppp interface. Even after the WAN pppoe interface is back up, pfsense continues to use the failover WAN for the LAN gateway. I disabled the failover interface and pfsense then routed the LAN traffic through the main WAN pppoe interface correctly, but as soon as I re-enabled the failover interface, pfsense again started routing LAN traffic through the failover interface. It is as if pfsense does not see the WAN as up, although traffic from other LANs appears to be passing through the WAN as normal.
  3. I received a notification from pfsense by email: "Gateways status could not be determined, considering all as up/active. (Group: [failover_group])"
  4. The Diagnostic>States page shows no existing state for the failover WAN, even while I am pinging or doing a traceroute through it from the LAN. Is this normal?

I don't know if these problems are related to this bug, but they are new problems to me and I only observe them when the WAN goes down.

Actions #15

Updated by Renato Botelho about 11 years ago

  • Assignee deleted (Renato Botelho)
Actions #16

Updated by Renato Botelho about 11 years ago

  • Status changed from Feedback to New
Actions #17

Updated by Bipin Chandra about 11 years ago

here is the thread at mpd forum, doesnt seem like a mpd issue but some network drivers

http://sourceforge.net/projects/mpd/forums/forum/44692/topic/6720934

Actions #18

Updated by Andrea Soster about 11 years ago

Dears,
I have the same problem with one of my firewall in china, I'm connected to a huawei SmartAX MA5620-8 -> there's a fiber connected to it and then an ethernet cable to pfsense. The provider is China Unicom. The box is an alix 2D13 running 2.0.2 i386

After some time the PPP connection drops and it's unable to reconnect, the only solution is to reboot the machine and the DSL comes back.

Which king of debug shall I provide the next time it happens?

I spoke to Jim Pingle on chat and he told me to save ppp log,mpd config files (where is it?), "ngctl list". Anything else?

Please note that I have already commented out the line:
/usr/sbin/ngctl shutdown $1
from the file
/usr/local/sbin/ppp-linkdown

but nothing changed
Let me know, thanks

Actions #19

Updated by Ermal Luçi almost 11 years ago

From reports this seems to be an issue of drivers.

Actions #20

Updated by Anonymous almost 11 years ago

This issue is starting to drive me a bit crazy.

I have multiwan set up with 2 x dsl PPPoE connections and a ppp 3g backup link. All links were working fine for 2 days then suddenly the second WAN link just starts going offline and nothing I do can get it back except by restarting the entire box (which is not favourable).

I have tried the above fix and commenting out the line in the ppp-linkdown file but it doesn't seem to have any effect.

Does anybody know if there is anything I can do to prevent this? I see some say it is a driver issue, but which drivers? If I replace the interface card with a different one in my router box is it possible this would solve my issue?

I'm just glad the failover works so well otherwise I would have a lot of angry users :)

Actions #21

Updated by Bipin Chandra almost 11 years ago

im suffering it on a alix which has via chipset, mayb others can mention theirs to be sure its via drivers because to my knowledge, the older pfsense when i first got the alix a few years ago didnt suffer this

Actions #22

Updated by Anonymous almost 11 years ago

Bipin Chandra wrote:

im suffering it on a alix which has via chipset, mayb others can mention theirs to be sure its via drivers because to my knowledge, the older pfsense when i first got the alix a few years ago didnt suffer this

My device:

[2.0.3-RELEASE][root@firewall.newco.local]/root(8): sysctl -a | grep re.0
dev.re.0.%desc: RealTek 8168/8111 B/C/CP/D/DP/E PCIe Gigabit Ethernet
dev.re.0.%driver: re
dev.re.0.%location: slot=0 function=0
dev.re.0.%pnpinfo: vendor=0x10ec device=0x8168 subvendor=0x7470 subdevice=0x3468 class=0x020000
dev.re.0.%parent: pci2

Actions #23

Updated by Bipin Chandra almost 11 years ago

so i believe its not drivers coz it cant be affecting different brands so the next thing that comes to my mind is netgraph as mentioned here http://forum.pfsense.org/index.php/topic,41061.msg315644.html#msg315644

Actions #24

Updated by Anonymous almost 11 years ago

I think my issue may be modem related. I thought I had ruled this out but it seems both modems I tested with gave issues. I tested with a third modem and it seems to be quite stable, I will however still have to test it over a prolonged period. I will update this issue with my results.

Actions #25

Updated by Renato Botelho almost 11 years ago

I managed to reproduce it locally with an ALIX board and can confirm the issue is related to vr driver. It's always reproducible when I use vr1 interface and the problem disappeared when I use a USB ethernet adapter (ue0).

Actions #26

Updated by Anonymous almost 11 years ago

Starting to look a lot more like mine was modem related. Replaced mine on yesterday afternoon, been up since then.

The one thing that I have to note is that the dsl in my area is very bad and the lines are often losing sync and disconnecting, it appears that certain devices are just better suited at handling bad line quality like this.

Actions #27

Updated by Bipin Chandra almost 11 years ago

so is there any solution to this, like an updated driver or so?

Actions #28

Updated by Mel Handumon almost 11 years ago

I'm also looking forward for the solution of this. I currently halt my 14 sites installation until the bug will be fixed.

Actions #29

Updated by Jim Pingle over 10 years ago

  • Subject changed from PPPoE won't reconnect after link loss to PPPoE won't reconnect after link loss when using vr(4) NICs on certain ISPs only

Clarifying description since this is much more limited in scope than the original subject implied.

Actions #30

Updated by Chris Buechler over 10 years ago

  • Target version changed from 2.1 to 2.2
Actions #31

Updated by Bipin Chandra about 10 years ago

this seems to be solved in the latest 2.1.1-PRERELEASE

Actions #32

Updated by Ermal Luçi about 10 years ago

  • Status changed from New to Feedback
Actions #33

Updated by Dmitriy K almost 10 years ago

I have the very same problem on 2.1.3 amd64: after series of disconnects pppoe daemon stops reconnecting;

ppp log: http://pastebin.com/AaFf0kbE

Actions #34

Updated by Bipin Chandra almost 10 years ago

i just tried it on 2.1.3 nanobsd alix and after the last time this was patched, it works fine, it was able to reconnect just fine, no idea about the amd64 built

Actions #35

Updated by Philippe P almost 10 years ago

After an upgrade from a rock-stable 2.0.3 to 2.1.3, I experience the same link lost problem with the PPP daemon.
My logs are here : https://forum.pfsense.org/index.php?topic=77480.msg422321#msg422321

If you need more logs, tests, packets monitoring or whatever, just please ask.

Actions #36

Updated by Dmitriy K almost 10 years ago

As a developer, I don't understand why pmd config is being generated EACH time on connect?! It's so hard to add a check against configuration change?! Or add some signalling to prevent early mpd start.

From my log:
May 26 20:53:30 ppp: OpenConfFile: Can't open file '/var/etc/mpd_wan.conf': No such file or directory
May 26 20:53:30 ppp: OpenConfFile: Can't open file '/var/etc/mpd_wan.conf': No such file or directory
May 26 20:53:30 ppp: can't read configuration for "pppoeclient"

From Philippe's log:
May 26 16:23:34 ppp: OpenConfFile: Can't open file '/var/etc/mpd_wan.conf': No such file or directory
May 26 16:23:34 ppp: OpenConfFile: Can't open file '/var/etc/mpd_wan.conf': No such file or directory
May 26 16:23:34 ppp: can't read configuration for "pppoeclient"

The problem is obvious, isn't? 6 days has passed and addressed nothing ...

Actions #37

Updated by Chris Largent almost 10 years ago

One of my sites that relies on radio-based Internet connectivity is experiencing this misbehavior. We are very, VERY fortunate that the radio link is extremely solid, nevertheless, interrupted PPPoE sessions still occur whenever a serious storm passes through the area. Every time the PPPoE session is disrupted, the pfSense device cannot re-establish a new PPPoE session, and the pfSense device must be completely rebooted. Similar to another author's previous post, when I test with a consumer-grade router, the PPPoE session is successfully re-established.

Attention bugtracker Administrator(s): Please remove the reference to vr(4) NIC's in the title of this bug. At this point, that title is incorrect. I write this because my device has a bge (Broadcom) for its WAN interface. Also, please change the status of this bug from 'Feedback' to 'New' (or whatever is appropriate to surface it to the developers). The affected versions and architectures should also be updated...

In my case, this issue occurred with 2.1.3-RELEASE (i386). Two days ago, I upgraded the device to 2.1.4-RELEASE using the Auto Update functionality. The issue remains.

Unfortunately this is the very first bug to force me to remove pfSense from a production environment. (It's the only production environment where I have had to use PPPoE.)

Actions #38

Updated by Steve Ovens over 9 years ago

I just wanted to chime in that I am also experiencing this problem however my want port is: em0 90:e2:ba:06:ba:93 (up) Intel(R) PRO/1000 Legacy Network Connection 1.0.6

I have nothing more valuable to add other than I am running the latest 2.1.5-RELEASE amd64

Actions #39

Updated by Chris Buechler over 9 years ago

  • Target version deleted (2.2)

this may or may not still be an issue with 2.2, much has changed, and this is something we've never been able to replicate. Need feedback on 2.2 from someone who sees this issue.

Actions #40

Updated by Dmitriy K over 9 years ago

The bug is still here. Fresh log attached.

Actions #41

Updated by Dmitriy K over 9 years ago

Damn, Today I had a 8h internet downtime because of this bug again while I was sleeping! Latest snapshot.

Actions #42

Updated by Bipin Chandra over 9 years ago

this problem persists on 2.2 RC

Actions #43

Updated by Michael Jephcote over 8 years ago

I am also experiencing this issue, luckily enough our connection is usually stable but if it is interrupted it reconnects but apinger reports it as being offline. I have to go into both WAN connections on the interfaces page and save them both applying my changes to bring it back to a working state.

Running version 2.2.5

Actions #44

Updated by Chris Buechler over 8 years ago

those who have the original issue here, a report back on whether it still occurs with 2.3 would be appreciated.

Michael Jephcote wrote:

I am also experiencing this issue, luckily enough our connection is usually stable but if it is interrupted it reconnects but apinger reports it as being offline. I have to go into both WAN connections on the interfaces page and save them both applying my changes to bring it back to a working state.

That's definitely not the same issue. It sounds like one of a variety of apinger problems that's been resolved in 2.3 by its replacement.

Actions #45

Updated by John Wilkes about 7 years ago

I am experiencing this issue with PFsense 2.3.2, running i386 nanobsd on an alix board.

Every time the upstream router looses connectivity, pfsense will not reconnect. A Reboot on pfsense is enough to make it reconnect. Perhaps a disable/enable on pppoe will suffice, but have not tested.

A simple on/off of the router is not enough tough to cause the issue. I need a real loss of connectivity for some time. That is... 20-30 minutes because of issues on the ISP side. If I just reboot the VDSL router, pfsense will reconnect (as far as I can tell).

The router is a technicolor VDSL router, used as a bridge only. The pppoe session is initiated by PfSense.

Here's the hardware:

FreeBSD 10.3-RELEASE-p5 #0 7307492(RELENG_2_3_2): Tue Jul 19 14:02:43 CDT 2016
root@ce23-i386-builder:/builder/pfsense-232/tmp/obj/builder/pfsense-232/tmp/FreeBSD-src/sys/pfSense_wrap i386
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
CPU: Geode(TM) Integrated Processor by AMD PCS (498.06-MHz 586-class CPU)
Origin="AuthenticAMD" Id=0x5a2 Family=0x5 Model=0xa Stepping=2
Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX>
AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!>
real memory = 268435456 (256 MB)
avail memory = 226562048 (216 MB)

Side Note: I tried to update to 2.3.3_1 as this is current atm, but have not been successful via the web gui. I'll try, time permitting, in the next days and report here if there are any changes in behavior.

Actions #46

Updated by caleb reft over 6 years ago

I've run into this the other day... 2.3.4-RELEASE-p1 (amd64). I had to go into status -> interfaces and manually click on the 'connect' button and everything came up fine... No logs because the default is only 50 entries (wtf?) so I'll have to up that and wait for it to happen again.

Actions #47

Updated by Jim Pingle over 6 years ago

  • Status changed from Feedback to Closed
  • Affected Architecture added
  • Affected Architecture deleted (amd64)

This bug was specific to vr(4) and the only major platform using vr(4) NICs is 32-bit only, which is no longer supported. If this still happens to anyone on 2.4.x, post on the forum to gather info and figure out how to reliably reproduce it. I have several PPPoE connections here and they have no difficulty establishing a connection when dropped.

Actions #48

Updated by Max Power about 5 years ago

Seems the bug is still present in 2.4.4 (running on SG-2220).
We got a wan interruption (they cut the cable while doing road works), and after everything was reconnected the pppoe interface doesn't come up automatically (wait many minutes).
Disable-Enable the WAN interface doesn't help, and I restored the pppoe connection with this command: /usr/local/sbin/pfSctl -c 'interface reload wan'

Thank you for your help

Actions #49

Updated by Yuran Yastreb over 4 years ago

I am experiencing the same issue with version 2.4.4-p3 on x86 hardware (re network interfaces).

Actions

Also available in: Atom PDF