Project

General

Profile

Bug #5520

IPsec status seems to hang preventing access to the webgui.

Added by Steve Wheeler over 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Marc Dye
Category:
IPsec
Target version:
Start date:
11/23/2015
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.2.5
Affected Architecture:
All

Description

In some circumstances the IPSec status page 'hangs' and causes access issues to the webgui.
If the IPSec widget is on the dashboard that also prevents accessing the dashboard page which makes it appear as though the entire webgui has stopped as it's the first page after logging in. In many cases though it's possible to access another page by entering the url directly.
From the CLI 'ipsec statusall' does not return anything, also appearing to 'hang'.

Associated revisions

Revision c5d8cbe0 (diff)
Added by Chris Buechler almost 3 years ago

Background all the ipsec commands run from status_ipsec.php to make sure they don't hang up the entire GUI. Ticket #5520

History

#1 Updated by Steve Beaver over 3 years ago

  • Assignee set to Steve Beaver

The IPSec status widget is being converted to Ajax which should prevent it from hanging as you describe. I'll update this when complete.

#2 Updated by Steve Wheeler over 3 years ago

Perhaps I've put this in the wrong place but this is a current issue affecting 2.2.5 and it also affects the CLI tool 'ipsec statusall'. Fixing the widget seems unlikely to help here.
See: ZYZ-44279 and ADI-13616

#3 Updated by Steve Beaver over 3 years ago

Understood. But since the widget work was already underway, I mentioned it here. We can reassign the issue for the CLI work if required.

#4 Updated by Steve Beaver over 3 years ago

  • Assignee deleted (Steve Beaver)

IPSec widget has been modified to update the widget via Ajax. This should at least prevent the GUI from hanging.

#5 Updated by Jim Thompson over 3 years ago

  • Assignee set to Chris Buechler

assigned to cmb for evaluation.

#6 Updated by Chris Buechler over 3 years ago

  • Status changed from New to Confirmed
  • Priority changed from Normal to High

when this happens, any command that tries to use charon.ctl hangs indefinitely. 'ipsec stop' doesn't even work, have to kill -9 charon for it to stop. Then works fine after starting it again. Not yet sure how to replicate.

#7 Updated by Chris Buechler over 3 years ago

appears this deadlock might be the source of this issue.
https://wiki.strongswan.org/issues/1185

#8 Updated by Renato Botelho over 3 years ago

  • Status changed from Confirmed to Feedback

Patch imported from strongswan and available for 2.2.6 and 2.3 snapshots

#9 Updated by Chris Buechler over 3 years ago

  • Subject changed from IPSec status seems to hang preventing access to the webgui. to IPsec status seems to hang preventing access to the webgui.
  • Status changed from Feedback to Confirmed

that patch didn't resolve the issue on one problem system. followed up on that strongswan bug ticket to see if we can get further details, also built strongswan pkg with -g and without strip for testing

#10 Updated by Brice Figureau about 3 years ago

I upgraded to 2.2.6 this morning, but I still see the same issue: ipsec tunnels hanging relatively frequently (several times per day), then if you kill -9 charon it clears the situation. When charon is frozen, it's not possible to issue ipsec status, it also freezes. At first I thought it would have been related to some users using the road-warrior tunnel, but I'm not sure.

What can I do to help troubleshoot the issue?

#11 Updated by Jim Thompson about 3 years ago

Please investigate https://wiki.strongswan.org/issues/1269

May not apply if original support customer isn't using IKEv1 (just checked, at least one of them is.)

#12 Updated by Brice Figureau about 3 years ago

Jim Thompson wrote:

Please investigate https://wiki.strongswan.org/issues/1269

May not apply if original support customer isn't using IKEv1 (just checked, at least one of them is.)

I'm using only IKEv1.

#13 Updated by Chris Buechler about 3 years ago

Brice: there is a patch for strongswan issue 1269 in 2.2.7 (https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/updates/) and 2.3 snapshots. I upgraded one customer system that was hitting the issue at least every few days and sometimes more often. It hasn't happened since, though it hasn't been long enough to know for sure whether that had any impact. Since you can replicate it much more easily, if you can try one of those and report back it'd be appreciated.

#14 Updated by bjoern karges about 3 years ago

Steve Wheeler wrote:

In some circumstances the IPSec status page 'hangs' and causes access issues to the webgui.
If the IPSec widget is on the dashboard that also prevents accessing the dashboard page which makes it appear as though the entire webgui has stopped as it's the first page after logging in. In many cases though it's possible to access another page by entering the url directly.
From the CLI 'ipsec statusall' does not return anything, also appearing to 'hang'.

I have more or less the same issue here.

We are running 2.2.6-RELEASE (amd64) on a SG-4860. If I connect with Shrew on a mobile client, and access the IPSec status page on the firewall, there is a possibility that the whole webinterface hangs up. All other functions are working fine, but the webinterface and the second site2site VPN are crashing.
The solution for me is, go to the serverroom, connect via USB at the console port, restart PHP Interface, and afterwards restart IPSec, and all is working fine again. Also my users, the ones who doesnt need the site2site or mobile clients VPN, can work without any knowledge of this problem. everything else is working fine in the background.
Btw. also my IPSec mobile clients VPN is working, after a reconnect on it, fine again.

I hope i exlained and descriped everything good enough. If you have further questions, please contact me. I thought maybe my infos are helping to find the issue.

Thx4all

Bjoern

#15 Updated by Chris Buechler about 3 years ago

The change from strongswan issue 1269 has no impact on this issue.

#16 Updated by Jim Thompson about 3 years ago

  • Assignee changed from Chris Buechler to Marc Dye

#17 Updated by Chris Buechler about 3 years ago

The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.

One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.

#18 Updated by Jim Thompson about 3 years ago

We're now at 45 hours since that post, so a total of 97 hours of run time, assuming that it hasn't hung yet.

Last time we tried, it hung at 4 days (~~96 hours).

So sometime Tuesday morning, if it's still not hung, I'm going to call it "fixed".

#19 Updated by Jim Thompson about 3 years ago

Jim Thompson wrote:

We're now at 45 hours since that post, so a total of 97 hours of run time, assuming that it hasn't hung yet.

Last time we tried, it hung at 4 days (~~96 hours).

So sometime Tuesday morning (when it's daylight), if it's still not hung, I'm going to call it "fixed".

#20 Updated by Greg M about 3 years ago

Hey Jim!

You can call it fixed, it didn`t happen on my 2 systems too.

#21 Updated by Brice Figureau about 3 years ago

Chris Buechler wrote:

The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.

One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.

Is this fix available in a 2.2 snapshot or only in a 2.3 snapshot?
Is there an ETA for this fix to land in a stable release (2.2.7 or 2.3)?

#22 Updated by Renato Botelho about 3 years ago

Brice Figureau wrote:

Chris Buechler wrote:

The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.

One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.

Is this fix available in a 2.2 snapshot or only in a 2.3 snapshot?
Is there an ETA for this fix to land in a stable release (2.2.7 or 2.3)?

You can find 2.2.7 snapshots with this fix at:

https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/
https://snapshots.pfsense.org/FreeBSD_releng/10.1/i386/pfSense_RELENG_2_2/

And it's also applied in 2.3 snapshots

#23 Updated by Greg M about 3 years ago

Hello!

On my 2.3 latest snap it happened again today.
Had to kill php and restart webconfigurator to gain access to the gui.

#24 Updated by Brad Benton about 3 years ago

I'm on 2.2.6-RELEASE (amd64) and experiencing this issue. Is there a patch I can apply? I tried the ones listed here but they would not test -- I suspect due to version.

Thanks!

#25 Updated by Renato Botelho about 3 years ago

  • Status changed from Confirmed to Feedback

#26 Updated by Chris Buechler about 3 years ago

The deadlock in strongswan is confirmed fixed with 2.3 on one production system that was hitting it frequently. There were other status hangs that didn't deadlock strongswan, but none of which we have a replicable case for.

If you're seeing issues here, now's the time to upgrade to 2.3. If you can still replicate problems on latest 2.3 snapshot, I want to work with you ASAP to track down the cause.

#27 Updated by Greg M about 3 years ago

Hi!

It still happens for me, I can`t always replicate. restarting php brings webgui back.
I have 2 pfsenses.

What I did:
1. Upgrade pfsense A to latest snap.
2. On pfsense B, go to status->Ipsec and disconnect VPN, after I confirmed all dialogues system freezed.
3. Restarted php on pfsense B and refreshed ipsec status which was displaying DELETING status of P1
4. Clicked on connect VPN and back in business.

Now I don`t know if this can be replicated this way I can`t try it because these are prod systems.

Firewall B is:
2.3-BETA (amd64)
built on Sun Mar 20 00:15:33 CDT 2016
FreeBSD 10.3-RC3

BR,
Greg

#28 Updated by Greg M about 3 years ago

I can replicate this everytime I do step 1 and 2 from above post.
So this is definitly not fixed.

#29 Updated by Greg M about 3 years ago

New way to repolicate.
Establish ipsec.
Reboot box A and wait 15 seconds (needs to be 100% down).
Now on box B disconnect IPsec via GUI = system frozen.

IMHO this happens everytime one of connected parties is down and ipsec just waits and waits and never timeouts.

#30 Updated by Chris Buechler almost 3 years ago

  • Status changed from Feedback to Resolved

Deadlock confirmed fixed on another system this week by upgrading to 2.3. That's the issue this ticket started with, and it's definitely fixed.

Greg: you're hitting some different, unrelated to everything else discussed here. In your case, it sounds like the 'ipsec down' command takes a long time to return. Figured it was maybe trying to disconnect a connection that isn't actually there anymore (as the mentioned reboot scenario will have a DELETE sent, so it'll be down and gone when you're hitting Disconnect, the only reason the Disconnect button's even available is because of no page refresh). Though I can't replicate that being a problem, I also never see it in "deleting" status. I changed that page to background those commands so it won't sit there for a long time waiting for the return. That will likely be fixed now. If not, please start a new ticket with specifics and we can track down what's happening there.

#31 Updated by Greg M almost 3 years ago

Hi Chris!
You are correct. Now with your additional fix to background the commands everything is really smooth, thanks!

#32 Updated by Phillip Davis almost 3 years ago

I guess that commit https://github.com/pfsense/pfsense/commit/c5d8cbe07c9646f34afebd2610ac34bed090ced0 to master should be cherry-picked to RELENG_2_3 branch, since it is "a good thing".

#33 Updated by Chris Buechler almost 3 years ago

Thanks for the follow up, Greg.

Phil: yes, RELENG_2_3 switch over isn't done yet, but when we switch to that we'll make sure master's synced up with it. Thanks for the catch.

#34 Updated by Mitch Claborn almost 3 years ago

I'm also experiencing the GUI hang on the IPSec status page on 2.2.6-RELEASE.

If I read the above correct, the 2.3 snapshots have a fix for this problem. Is that accurate?
What is the schedule for a 2.3 release version?

#36 Updated by Luke Hamburg almost 3 years ago

Oh my. These are exciting times.

Also available in: Atom PDF