Bug #1629

invalid state table entries after WAN IP change

Added by Eli Hunter almost 3 years ago. Updated 1 day ago.

Status:New Start date:06/29/2011
Priority:High Due date:
Assignee:- % Done:

50%

Category:Rules/NAT
Target version:2.2
Affected version:All Affected Architecture:

Description

We have an asterisk server behind pfsense 2.0-RC3 using a PPPoE DSL connection.
Whenever our WAN IP changes the asterisk server cannot register to the providers.
Flushing the state table allows asterisk to register again.

state_table.txt Magnifier (25.2 kB) Eli Hunter, 07/05/2011 09:00 pm

log.txt Magnifier (20.6 kB) Eli Hunter, 08/02/2011 08:32 pm

Associated revisions

Revision 8ed47897
Added by Ermal Luçi almost 3 years ago

Kill states from the previous ip the link had on all mpd consumers. Resolves #1629

Revision 0f2826c0
Added by Ermal Luçi almost 3 years ago

Kill states from the previous ip the link had on all mpd consumers. Resolves #1629

Revision 06498591
Added by Jim P over 1 year ago

Try to remove old states when a DHCP IP changes, might be related to ticket #1629 and also "unable to allocate llinfo" messages from states through an old gateway.

Revision 096f2962
Added by Ermal Luçi about 1 year ago

Ticket #1629 Another round of fixes related to state clearing

Revision c59dd719
Added by Ermal Luçi 8 months ago

Revert back the behaviour to cleanup all states for 2.1 Fixes #3181 and related to Ticket #1629. This commit is only for 2.1 since on master development will continue for better alternatives

Revision 5aa44e98
Added by Ermal Luçi 8 months ago

Revert "Revert back the behaviour to cleanup all states for 2.1 Fixes #3181 and related to Ticket #1629. This commit is only for 2.1 since on master development will continue for better alternatives"

A bit too excessive need to get right.

This reverts commit c59dd719e0a6d9ee8deecaa7bff0d6ee8c76e4ca.

History

#1 Updated by Evgeny Yurchenko almost 3 years ago

Please provide state-table dump before IP change and after.

#2 Updated by Eli Hunter almost 3 years ago

Hopfully this is what you wanted.

My IP before the address changed was 76.254.18.100 and the new assigned address is 99.179.45.73

Bad entry in state table before resetting states
udp 10.0.4.3:5060 -> 76.254.18.100:13819 -> 67.215.241.250:5060 SINGLE:NO_TRAFFIC

I've attached a txt document with the state table info before and after flushing it.

#3 Updated by Chris Buechler almost 3 years ago

  • Category set to PPP
  • Priority changed from Normal to High
  • Target version set to 2.0
  • Affected version set to 2.0

PPPoE is supposed to clear all states on that interface when an IP changes, that's not happening correctly.

#4 Updated by Ermal Luçi almost 3 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100

#6 Updated by Fábio Pinto Coelho almost 3 years ago

I'm still on 1.2.3 and the same problem happens on it.

As a workaround, you may check http://forum.pfsense.org/index.php?topic=18053.5

I use two VoIP providers, so I have modified the script to clear the states for both of them. If you need my updated script, or any help on implementing it, please let me know.

I can see Ermal Luçi has already released a fix, but I thought I'd just let you know...

#7 Updated by Chris Buechler almost 3 years ago

That's expected to happen in 1.2.3 (it has no provisions for dealing with that scenario, only 2.0 does).

#8 Updated by Ermal Luçi almost 3 years ago

Have you tested this on 2.0?

#9 Updated by Eli Hunter almost 3 years ago

I got the update installed last week but haven't had the IP change on me yet (surprisingly). I'll update this once the IP's changed.

#10 Updated by Matt Corallo almost 3 years ago

I have the same problem (after the fixes) with IPv6 tunneling, so this is not resolved.

#11 Updated by Chris Buechler almost 3 years ago

IPv6 is a completely different version, that's 2.1 not 2.0, post info to the IPv6 board on the forum.

#12 Updated by Matt Corallo almost 3 years ago

No, no, Im not talking about IPv6 in pfSense, Im talking about IPv6 NAT passthrough in the "System: Advanced: Networking" menu in 2.0, not 2.1. Its the same bug for the same reason.

#13 Updated by Eli Hunter almost 3 years ago

I had it reset again this weekend which took the asterisk server down again. Unfortunately I wasn't near a computer and had to get things up and running for them quickly so I used my phone to reset the state table. This got their phones working again but I wasn't able to get a copy of any logs. I'm going to assume this isn't fixed yet but it's probably good to wait until it happens again so I can make sure it's still a problem.

#14 Updated by Eli Hunter over 2 years ago

It's still happening.
Again here's a relevant section. Our Asterisk server at 10.0.4.3 is still trying to use the old gateway address of 99.58.29.27. Our server at 10.0.4.100 is using the correct gateway at 99.169.80.219

I can attach the full state table again if it helps.

udp 10.0.4.3:5060 -> 99.58.29.27:37488 -> 209.62.1.2:5060 MULTIPLE:MULTIPLE
tcp 10.0.4.100:55507 -> 99.169.80.219:46579 -> 216.52.233.157:443 ESTABLISHED:ESTABLISHED

#15 Updated by Ermal Luçi over 2 years ago

Can you post system log with state table as well?

#16 Updated by Eli Hunter over 2 years ago

I just copied the system log page and state table in attached document.

Would collecting the data with a syslog server help? I can get it setup if it helps track this down.

#17 Updated by Ermal Luçi over 2 years ago

From the attached:
- What is the old gateway?
- What is the new gateway?
- What is the wrong entry?

#18 Updated by Chris Buechler over 2 years ago

  • Target version changed from 2.0 to 2.0.1

#19 Updated by Luke Hamburg over 2 years ago

Hi- I've just experienced the exact same issue. pfSense 2.0(REL) running nanobsd-2g on a Netgate Hamakua. My WAN DHCP lease expired and when it renewed the IP had changed. My internal Asterisk server lost all trunk registrations and I had to manually reset the states to fix it.

Has there been any update on this problem - or is there a workaround that doesn't require manual intervention?

#20 Updated by Andrea Cutelle' over 2 years ago

Hi, the same error in my installation. pfsense 2.0 rel running on jetway nc9c-550lf. I have static public ip, when the connection change state up to down and then again up my asterisk server lost connection to the trunk with state request sent. resetting state work well again.

sorry for my english..

#21 Updated by Chris Buechler over 2 years ago

  • Target version deleted (2.0.1)

#22 Updated by Pho Bia over 2 years ago

I also experience this with my SIP device (PAP2T). I thought my provider was to blame as changing the remote server usually got my phones back online.

I have a multiple WAN (3) setup.

Are you still looking for logs on this, or is the fix already known?

#23 Updated by Pho Bia over 2 years ago

This is what my states look like for my effected device from Diagnostics --> States when my VoIP adapter shows offline (filtered for my PAP2T IP only) :

udp 64.120.22.242:5060 <- 192.168.0.100:5061 NO_TRAFFIC:SINGLE
udp 64.120.22.242:5060 <- 192.168.0.100:5060 NO_TRAFFIC:SINGLE
udp 192.168.0.100:5060 -> 64.231.21.117:52434 -> 64.120.22.242:5060 SINGLE:NO_TRAFFIC
udp 192.168.0.100:5061 -> 64.231.21.117:57887 -> 64.120.22.242:5060 SINGLE:NO_TRAFFIC

This is what it looks like after I reset states and my service goes back online :

udp 64.120.22.242:5060 <- 192.168.0.100:5061 MULTIPLE:MULTIPLE
udp 192.168.0.100:5061 > 64.231.160.191:46020 -> 64.120.22.242:5060 MULTIPLE:MULTIPLE
udp 64.120.22.242:5060 <
192.168.0.100:5060 MULTIPLE:MULTIPLE
udp 192.168.0.100:5060 -> 64.231.160.191:56285 -> 64.120.22.242:5060 MULTIPLE:MULTIPLE

#24 Updated by Christian Schwarz over 2 years ago

Bug still present with 2.0.1.
This is not happening every time the IP chages.(Provider disonnects once a day) But after a few days our SIP-Registration is down. Statetable shows entry to SIP-Provider with old WAN-IP...
Flushing the states manually will bring the trunk up again. (Using Telekom Germany with Panasonic PBX)

#25 Updated by Akom Benevolent almost 2 years ago

Same deal, 2.0.1-RELEASE and this happens every so often, but not on every IP change. I can delete the 2 state entries for the old WAN IP, and then asterisk registers fine.

#26 Updated by Grant Emsley almost 2 years ago

I'm seeing the exact same behavior using a PPPOE internet connection, 2.0.1-RELEASE i386, and asterisk.

When my connection goes down, the states remain in the state table and the asterisk server is unable to register.

#27 Updated by Nicklas Blidmo almost 2 years ago

Same issue with WAN DHCP, 2.0.1-RELEASE (i386) (nanobsd)

#28 Updated by Chris Buechler almost 2 years ago

  • Category changed from PPP to Rules/NAT
  • Status changed from Feedback to New
  • Target version set to 2.1
  • Affected version changed from 2.0 to All

#29 Updated by Grant Emsley almost 2 years ago

I don't know if it will help, but I noticed it happens much more frequently when my internet connection is flaky. It happened almost every time when my PPPOE connection was dropping every 2-5 minutes due to a loose cable.

#30 Updated by fos4X fos4X over 1 year ago

I can confirm that this problem still exists in 2.0.1-RELEASE (amd64) built on Mon Dec 12 18:16:13 EST 2011 using PPPoE with static IP (but 24h disconnects)

#31 Updated by Jim P over 1 year ago

  • Status changed from New to Feedback

Some fixes for this have gone into 2.1 over the past few months. Try a 2.1-BETA snapshot and see if it's repeatable there.

#32 Updated by fos4X fos4X over 1 year ago

Confirmed to still be an issue in 2.1-BETA0 (amd64) built on Wed Nov 28 15:23:39 EST 2012

A reconnect of PPPoE WAN (manual or 24h) leads to asterisk beind unable to Register, it hangs at "Request Sent".

Could be related to #2700 but if older revisions are any indication, removing the /32 from $3/32 will not help either.

I would be willing to test any suggested (hot)fixes.

I read elsewhere that a pfctl -b <gw> in ppp_linkup (UP!!!!) helps, have not tried it yet though (every trial means disconnecting my colleagues from the web for a few minutes)

#33 Updated by pierre mayer over 1 year ago

still not working with 2.1Beta0(i386)built pfSense-memstick-2.1-BETA0-i386-20121128-1058.img

need to reset state table to make freepbx working

#34 Updated by Ermal Luçi about 1 year ago

  • Target version changed from 2.1 to 2.2

The only real solution to this is to switch to if-bound states for many reasons.
That is a bit more involved changed for 2.1

#35 Updated by Chris Buechler about 1 year ago

  • Status changed from Feedback to New
  • Target version changed from 2.2 to 2.1

we at least need the option to wipe the entire state table upon IP change.

#36 Updated by Ermal Luçi about 1 year ago

  • Status changed from New to Feedback

Ok i went and did another implementation fix for this.
Can you please try with later 2.1 snapshots and see if it behaves correctly?

#37 Updated by Tobias Wigand about 1 year ago

Does not work for me.
Correct me if I'm mistaken here, but can
pfctl -i
work without binding states to interfaces?
One of my external interfaces is em1, but
pfctl -i em1 -ss
does not show anything. Altough I have a working VoIP state in the table going out on that interface.

#38 Updated by Ermal Luçi about 1 year ago

Check with later coming snapshot there was a problem with the patch that has been corrected.

#39 Updated by Tobias Wigand about 1 year ago

Does not work, sorry. Only the "Out" states are flushed. The "In" states persist and seem to remember their gateway. After some time the "Out" states are coming back with the non-existent / down gateway. You can see this in the long pftop view. The debug.rules are correct, but they are never used because of the persisting "In" states. Or do I need to use floating rules do use this, will they behave differently?

#40 Updated by Matthias Dilbert about 1 year ago

This problem also affects me. I’ve upgraded to Snapshot "built on Sat Feb 9 23:46:16 EST 2013". I will look for the problem to reoccur.

#41 Updated by Matthias Dilbert about 1 year ago

Today the problem occurred again. So it was not fixed yet.

#42 Updated by Ermal Luçi about 1 year ago

I just pushed another change to reset states with certain gateways set.
It should behave even better than previously, since it will send a RST for tcp states getting killed belonging to a certain gateway.

UPDATE for later: Probably a more through way of chained dependency of state need to be implemented with -Fs option.
As if you kill a state belonging to an interface try to find the correlated state on any other interface if present, especially on non-pfSense originated traffic.
That is a bit more involved and more careful checks of not corrupting the table needs to be done but for now this should work correctly.

#43 Updated by Tobias Wigand about 1 year ago

Thanks, it works with my VoIP device now. The states get killed correctly.

#44 Updated by Dim Hatz about 1 year ago

Ermal, testing this feature on a pfsense box with a WAN interface that gets via DHCP an IP in a /24 subnet (i.e. it's not PPPoE), it won't kill pf states that originate from the LAN to any host in that /24 subnet (which includes the gwip). Connections beyond the WAN subnet seem to get killed, but the LAN states don't.

E.g. I establish an ssh connection from 192.168.100.12 to aa.bb.40.155, then manually flushed states using:
pfctl -i em0 -Fs -G gwip

After doing it 3 times, I get:
pfctl ss | fgrep 40.155
em1 tcp aa.bb.40.155:22 <
192.168.100.12:3131 ESTABLISHED:ESTABLISHED
em1 tcp aa.bb.40.155:22 <- 192.168.100.12:3590 ESTABLISHED:ESTABLISHED
em1 tcp aa.bb.40.155:22 <- 192.168.100.12:3595 ESTABLISHED:ESTABLISHED
em1 tcp aa.bb.40.155:22 <- 192.168.100.12:3597 ESTABLISHED:ESTABLISHED
em0 tcp 192.168.100.12:3597 -> xxx.yyy.1.201:65161 -> aa.bb.40.155:22 ESTABLISHED:ESTABLISHED

em0 WAN - xxx.yyy.1.201
em1 LAN - 192.168.100.1
remote ssh server - aa.bb.40.155

PS: Running latest 2.1-BETA snapshot (12-Feb 08:58)
MD5 (/sbin/pfctl) = af1a7f62f1ae26958ba050f6c6f418a6

#45 Updated by Dim Hatz about 1 year ago

To followup my previous post, I have verified that the WAN (em0) states are indeed flushed, however their corresponding LAN (em1) states linger on.

#46 Updated by Renato Botelho about 1 year ago

  • Status changed from Feedback to New
  • % Done changed from 100 to 50

#47 Updated by Matthias Dilbert about 1 year ago

I’ve upgraded to the latest beta, but the problem still persists. Even when the modem is restartet and i don’t get a new ip, the states go wrong.

#48 Updated by Sebastian Chrostek about 1 year ago

Same Problem here with 2.1 Beta (built on Fri Mar 1 21:17:31 EST 2013)

It seems that also states without the old IP in it make problems with SIP.

In my case this two states:

udp 212.227.18.199:5060 <- 172.17.0.1:5060 MULTIPLE:MULTIPLE
udp 172.17.0.1:5060 -> 212.227.18.199:5060 SINGLE:NO_TRAFFIC

for my VoIP connection to "1und1" prevent asterisk from getting a connection.
if i clear only this two states, asterisk gets a connection only a few seconds later.

i use a PPPOE WAN

#49 Updated by Tom De Coninck 11 months ago

I also have the same issues and following this issue. Maybe I can provide some extra information, sorry if it's double

I started using pfsense since version 2.0, now running the latest 2.03 on alix board.

I was using a PPTP VDSL connection, and i am now using a cable WAN connection. There was no difference for asterisk. When the WAN ip changed, the UDP state with old wan ip address stayed alive. With the different WAN connections the bad UDP state stayed alive.

When the wan IP changed into a new address, the state stayed alive with the old WAN IP ADDRESS

udp LOCALASTERISKIP:5060 -> WANIPOLD:17205 -> SIPPROVIDER:5060 MULTIPLE:MULTIPLE
instead of
udp LOCALASTERISKIP:5060 -> WANIPNEW:17205 -> SIPPROVIDER:5060 MULTIPLE:MULTIPLE

I was able to kill the state using
pfctl -k LOCALASTERISKIP -k SIPPROVIDER

So to fix my issue i had to run this command every time the WAN IP Address changes.

I created this script with info i found in the internet

create /usr/local/bin/reset_states.sh

#!/bin/sh
# Kill Udp Sip States after new wan IP
echo "Killing States from ASTERISKIP to SIPPROVIDER" |logger;
/sbin/pfctl -k ASTERISKIP -k SIPPROVIDER

Change file permissions

chmod 755 /usr/local/bin/reset_states.sh

Edit config file /conf/config.xml

<system>
...
<afterfilterchangeshellcmd>/usr/local/bin/reset_states.sh</afterfilterchangeshellcmd>
</system>

Asterisk configuration

#pfctl -st

udp.first                    60s
udp.single                   30s
udp.multiple                 60s

Running the command shows me that the states die after 60s of inactivity.

To keep the state alive, keep the qualify under 60s, in my case 30s (30000)

; SIPPRODER_SIPPHONENUMBER
[SIP-PROVIDER-13764962994f4fdde1430ba]
qualify=30000

This works for me , hope it helps anyone.. and looking forward to a permanent fix

my compliments for the pfsense programmers, i am a very happy user, and i will promote it.

#50 Updated by Tom De Coninck 11 months ago

When we have a state like this :

udp LOCALASTERISKIP:5060 -> WANIPOLD:17205 -> SIPPROVIDER:5060 MULTIPLE:MULTIPLE

Is it possible to kill states based on WANIPOLD with pfctl ?

#51 Updated by Martin Oosterheert 10 months ago

I am also affected by this bug in 2.0.3.
In my case not a changed ipadres on my WAN, but a dual Wan setup with failover in which i simulate a failed WANlink
PfSense registers in a few seconds that the failed WAN is down and websitebrowsing recontinues after a few seconds on the other WAN, but VoIP can take 5 - 10 minutes with my TCP SIP account or 20 or so minutes with the standard (and preferred) UDP SIP account.

Each time the states table shows entries like:

udp  INTERNALASTERISKIP:5060  <-  PUBLICASTERISKIP:5060  <-  SIPPROVIDERIP:5060  MULTIPLE:MULTIPLE
udp  SIPPROVIDERIP:5060  ->  PUBLICASTERISKIP:5060  MULTIPLE:MULTIPLE
(i have a 1:1 nat between my local asterisk ip (192.168.1.23) and a public ip 109.x.x.x )

Resetting the states resolves the problem at once.
Since this is a case of interface failing a pfctl -i fxp0(or whichever interfacename) seems appropriate and does solve my problem.
However thats not very practical in a business setup...

I hope this info helps !

#52 Updated by Hannes Meer 9 months ago

I'm facing the samem problem with latest 2.1-RC. Anything we can do to get a solution?

#53 Updated by Renato Botelho 8 months ago

  • Target version changed from 2.1 to 2.2

#3181 is a band-aid for 2.1, this will need to wait 2.2

#54 Updated by Eric Jacksch 3 months ago

Still a significant issue - causing random VoIP outages. Would be great to get this fixed.

#55 Updated by Dim Hatz 3 months ago

It seems that in recent weeks there have been several related commits in 10-STABLE, e.g.

http://lists.freebsd.org/pipermail/svn-src-all/2014-January/079820.html
http://lists.freebsd.org/pipermail/svn-src-all/2014-January/079821.html

as well as several bugfixes, which apparently didn't make it into 10.0 RELEASE ...

#56 Updated by Andy Lawson 16 days ago

I'm still experiencing this issue with pfsense 2.1 on an ALIX platform and an Cisco SPA112 ATA.
pfsense is configured with a single ADSL/PPPoE WAN, but does not clear the state entry for this device on WAN IP change.
This issue doesn't get mentioned in the release notes for pfsense 2.1.1 (out today) so I assume it's not resolved there.

#57 Updated by Tom De Coninck 14 days ago

Friends, Developers
i have been doing some extensive testing on this issue yesterday evening.. yes i know ...get a life! :)

My wan connection is DHCP, but I always get the same ip address. Before i had a DSL connection with DHCP and every 3 days a new ip address. Even with this fixed IP, the issue unfortunatly remains with the states ..

i have a theory about this, maybe facts , please correct me where i'm wrong

From what i gatherd, these asterisk parameters are causing the states

qualifyfreq = 30
when the provider is reachable, asterisk send '102 OPTIONS' message every 30 seconds)
-> My advice, keep this UNDER 60 which is the life of udp state in pfsense. This keeps the state stays alive, we don't have to open ports in the firewall

qualify = 5000
When the provider is unreachable, asterisk send '102 OPTIONS' message every 1 second for 5 seconds (5000ms), then it waits 10 seconds to start over(in default compilations of asterisk)
-> I cannot give good advice about this one. This parameter in combination with pfsense gives us troubles :)

defaultexpiry=180
reregisters the provider every 180 seconds
-> My advice, keep this above 60 which is the life of udp state in pfsense

registertimeout=120
When the provide is unreachable, asterisk tries to register every 120 seconds
-> My advice, keep this above 60 which is the life of udp state in pfsense

When the WAN connection is down, asterisk will start the next qualify after max 30 seconds and asterisk will detect the provider is unreachable
. Then the qualify starts to send the 'OPTIONS' for 5 seconds

Everybody thinks that that the states aren't killed, but i think pfsense did kill them. I think after the wan failure, it created new states.

Afther the wan failure i noticed 2 states in pfsense 2.1.1

#pfctl -ss | grep 85.119.188.3
vr0_vlan10 udp 85.119.188.3:5060 <- 192.168.150.80:5060       NO_TRAFFIC:SINGLE
lo0 udp 192.168.150.80:5060 -> 85.119.188.3:5060       SINGLE:NO_TRAFFIC

When the wan interface comes back the sip provider stays unreachable. It keeps sending qualify messages , but i think it stays in the loopback interface

it this normal behaviour ?

thanks,
Tom

#58 Updated by Tom De Coninck 6 days ago

This week i have done some more testing on this issue nr #1629.

Everybody in that issue is talking that the states do not get killed. I have been testing this manually, even if the state gets killed, the issue remains.

I did it manually :

1. Kill all states when WAN is down
-> every state is killed
[2.1.2-RELEASE][]/root(3): pfctl -k 192.168.150.80 -k 85.119.188.3
killed 2 states from 1 sources and 1 destinations
[2.1.2-RELEASE][]/root(4): pfctl -ss | grep 85.119.188.3
[2.1.2-RELEASE][]/root(5):

2. After a while, i notice that asterisk creates new states
#pfctl ss | grep 85.119
vr0_vlan10 udp 85.119.188.3:5060 <
192.168.150.80:5060 NO_TRAFFIC:SINGLE
lo0 udp 192.168.150.80:5060 -> 85.119.188.3:5060 SINGLE:NO_TRAFFIC

3. When wan comes up again asterisk cannot connect

Please notice that there is a state created on the loopback interface. When i kill that state, asterisk is reconnecting to the provider.

I'm not seure, Is it possible to not create states on a loobpack interface? maybe that could be the fix?

Hope you can do sometihng with this info

thanks for the great pfsense software!

#59 Updated by Tom De Coninck 1 day ago

sorry for spamming...yet another update..

Today there were troubles with the Wan provider. In pfsense the gateway went down and i receved this WAN ip address 192.168.100.10 . I have seen this behaviour before after these Docsis cable modems.

But when WAN came back up, these states remains

vr0_vlan10 udp 85.119.188.3:5060 <- 192.168.150.80:5060       NO_TRAFFIC:SINGLE
vr1 udp 192.168.150.80:5060 -> 192.168.100.10:23920 -> 85.119.188.3:5060       SINGLE:NO_TRAFFIC

I think this proves that pfsense not only needs to kill states on 'WAN DOWN' , but also on 'WAN UP'. I can't see how it could work otherwise

cu

Also available in: Atom PDF