Project

General

Profile

Actions

Bug #5520

closed

IPsec status seems to hang preventing access to the webgui.

Added by Steve Wheeler almost 9 years ago. Updated over 8 years ago.

Status:
Resolved
Priority:
High
Assignee:
Marc Dye
Category:
IPsec
Target version:
Start date:
11/23/2015
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.2.5
Affected Architecture:
All

Description

In some circumstances the IPSec status page 'hangs' and causes access issues to the webgui.
If the IPSec widget is on the dashboard that also prevents accessing the dashboard page which makes it appear as though the entire webgui has stopped as it's the first page after logging in. In many cases though it's possible to access another page by entering the url directly.
From the CLI 'ipsec statusall' does not return anything, also appearing to 'hang'.

Actions #1

Updated by Anonymous almost 9 years ago

  • Assignee set to Anonymous

The IPSec status widget is being converted to Ajax which should prevent it from hanging as you describe. I'll update this when complete.

Actions #2

Updated by Steve Wheeler almost 9 years ago

Perhaps I've put this in the wrong place but this is a current issue affecting 2.2.5 and it also affects the CLI tool 'ipsec statusall'. Fixing the widget seems unlikely to help here.
See: ZYZ-44279 and ADI-13616

Actions #3

Updated by Anonymous almost 9 years ago

Understood. But since the widget work was already underway, I mentioned it here. We can reassign the issue for the CLI work if required.

Actions #4

Updated by Anonymous almost 9 years ago

  • Assignee deleted (Anonymous)

IPSec widget has been modified to update the widget via Ajax. This should at least prevent the GUI from hanging.

Actions #5

Updated by Jim Thompson almost 9 years ago

  • Assignee set to Chris Buechler

assigned to cmb for evaluation.

Actions #6

Updated by Chris Buechler almost 9 years ago

  • Status changed from New to Confirmed
  • Priority changed from Normal to High

when this happens, any command that tries to use charon.ctl hangs indefinitely. 'ipsec stop' doesn't even work, have to kill -9 charon for it to stop. Then works fine after starting it again. Not yet sure how to replicate.

Actions #7

Updated by Chris Buechler almost 9 years ago

appears this deadlock might be the source of this issue.
https://wiki.strongswan.org/issues/1185

Actions #8

Updated by Renato Botelho almost 9 years ago

  • Status changed from Confirmed to Feedback

Patch imported from strongswan and available for 2.2.6 and 2.3 snapshots

Actions #9

Updated by Chris Buechler almost 9 years ago

  • Subject changed from IPSec status seems to hang preventing access to the webgui. to IPsec status seems to hang preventing access to the webgui.
  • Status changed from Feedback to Confirmed

that patch didn't resolve the issue on one problem system. followed up on that strongswan bug ticket to see if we can get further details, also built strongswan pkg with -g and without strip for testing

Actions #10

Updated by Brice Figureau almost 9 years ago

I upgraded to 2.2.6 this morning, but I still see the same issue: ipsec tunnels hanging relatively frequently (several times per day), then if you kill -9 charon it clears the situation. When charon is frozen, it's not possible to issue ipsec status, it also freezes. At first I thought it would have been related to some users using the road-warrior tunnel, but I'm not sure.

What can I do to help troubleshoot the issue?

Actions #11

Updated by Jim Thompson almost 9 years ago

Please investigate https://wiki.strongswan.org/issues/1269

May not apply if original support customer isn't using IKEv1 (just checked, at least one of them is.)

Actions #12

Updated by Brice Figureau almost 9 years ago

Jim Thompson wrote:

Please investigate https://wiki.strongswan.org/issues/1269

May not apply if original support customer isn't using IKEv1 (just checked, at least one of them is.)

I'm using only IKEv1.

Actions #13

Updated by Chris Buechler almost 9 years ago

Brice: there is a patch for strongswan issue 1269 in 2.2.7 (https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/updates/) and 2.3 snapshots. I upgraded one customer system that was hitting the issue at least every few days and sometimes more often. It hasn't happened since, though it hasn't been long enough to know for sure whether that had any impact. Since you can replicate it much more easily, if you can try one of those and report back it'd be appreciated.

Actions #14

Updated by bjoern karges almost 9 years ago

Steve Wheeler wrote:

In some circumstances the IPSec status page 'hangs' and causes access issues to the webgui.
If the IPSec widget is on the dashboard that also prevents accessing the dashboard page which makes it appear as though the entire webgui has stopped as it's the first page after logging in. In many cases though it's possible to access another page by entering the url directly.
From the CLI 'ipsec statusall' does not return anything, also appearing to 'hang'.

I have more or less the same issue here.

We are running 2.2.6-RELEASE (amd64) on a SG-4860. If I connect with Shrew on a mobile client, and access the IPSec status page on the firewall, there is a possibility that the whole webinterface hangs up. All other functions are working fine, but the webinterface and the second site2site VPN are crashing.
The solution for me is, go to the serverroom, connect via USB at the console port, restart PHP Interface, and afterwards restart IPSec, and all is working fine again. Also my users, the ones who doesnt need the site2site or mobile clients VPN, can work without any knowledge of this problem. everything else is working fine in the background.
Btw. also my IPSec mobile clients VPN is working, after a reconnect on it, fine again.

I hope i exlained and descriped everything good enough. If you have further questions, please contact me. I thought maybe my infos are helping to find the issue.

Thx4all

Bjoern

Actions #15

Updated by Chris Buechler almost 9 years ago

The change from strongswan issue 1269 has no impact on this issue.

Actions #16

Updated by Jim Thompson almost 9 years ago

  • Assignee changed from Chris Buechler to Marc Dye
Actions #17

Updated by Chris Buechler over 8 years ago

The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.

One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.

Actions #18

Updated by Jim Thompson over 8 years ago

We're now at 45 hours since that post, so a total of 97 hours of run time, assuming that it hasn't hung yet.

Last time we tried, it hung at 4 days (~~96 hours).

So sometime Tuesday morning, if it's still not hung, I'm going to call it "fixed".

Actions #19

Updated by Jim Thompson over 8 years ago

Jim Thompson wrote:

We're now at 45 hours since that post, so a total of 97 hours of run time, assuming that it hasn't hung yet.

Last time we tried, it hung at 4 days (~~96 hours).

So sometime Tuesday morning (when it's daylight), if it's still not hung, I'm going to call it "fixed".

Actions #20

Updated by Greg M over 8 years ago

Hey Jim!

You can call it fixed, it didn`t happen on my 2 systems too.

Actions #21

Updated by Brice Figureau over 8 years ago

Chris Buechler wrote:

The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.

One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.

Is this fix available in a 2.2 snapshot or only in a 2.3 snapshot?
Is there an ETA for this fix to land in a stable release (2.2.7 or 2.3)?

Actions #22

Updated by Renato Botelho over 8 years ago

Brice Figureau wrote:

Chris Buechler wrote:

The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.

One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.

Is this fix available in a 2.2 snapshot or only in a 2.3 snapshot?
Is there an ETA for this fix to land in a stable release (2.2.7 or 2.3)?

You can find 2.2.7 snapshots with this fix at:

https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/
https://snapshots.pfsense.org/FreeBSD_releng/10.1/i386/pfSense_RELENG_2_2/

And it's also applied in 2.3 snapshots

Actions #23

Updated by Greg M over 8 years ago

Hello!

On my 2.3 latest snap it happened again today.
Had to kill php and restart webconfigurator to gain access to the gui.

Actions #24

Updated by Brad Benton over 8 years ago

I'm on 2.2.6-RELEASE (amd64) and experiencing this issue. Is there a patch I can apply? I tried the ones listed here but they would not test -- I suspect due to version.

Thanks!

Actions #25

Updated by Renato Botelho over 8 years ago

  • Status changed from Confirmed to Feedback
Actions #26

Updated by Chris Buechler over 8 years ago

The deadlock in strongswan is confirmed fixed with 2.3 on one production system that was hitting it frequently. There were other status hangs that didn't deadlock strongswan, but none of which we have a replicable case for.

If you're seeing issues here, now's the time to upgrade to 2.3. If you can still replicate problems on latest 2.3 snapshot, I want to work with you ASAP to track down the cause.

Actions #27

Updated by Greg M over 8 years ago

Hi!

It still happens for me, I can`t always replicate. restarting php brings webgui back.
I have 2 pfsenses.

What I did:
1. Upgrade pfsense A to latest snap.
2. On pfsense B, go to status->Ipsec and disconnect VPN, after I confirmed all dialogues system freezed.
3. Restarted php on pfsense B and refreshed ipsec status which was displaying DELETING status of P1
4. Clicked on connect VPN and back in business.

Now I don`t know if this can be replicated this way I can`t try it because these are prod systems.

Firewall B is:
2.3-BETA (amd64)
built on Sun Mar 20 00:15:33 CDT 2016
FreeBSD 10.3-RC3

BR,
Greg

Actions #28

Updated by Greg M over 8 years ago

I can replicate this everytime I do step 1 and 2 from above post.
So this is definitly not fixed.

Actions #29

Updated by Greg M over 8 years ago

New way to repolicate.
Establish ipsec.
Reboot box A and wait 15 seconds (needs to be 100% down).
Now on box B disconnect IPsec via GUI = system frozen.

IMHO this happens everytime one of connected parties is down and ipsec just waits and waits and never timeouts.

Actions #30

Updated by Chris Buechler over 8 years ago

  • Status changed from Feedback to Resolved

Deadlock confirmed fixed on another system this week by upgrading to 2.3. That's the issue this ticket started with, and it's definitely fixed.

Greg: you're hitting some different, unrelated to everything else discussed here. In your case, it sounds like the 'ipsec down' command takes a long time to return. Figured it was maybe trying to disconnect a connection that isn't actually there anymore (as the mentioned reboot scenario will have a DELETE sent, so it'll be down and gone when you're hitting Disconnect, the only reason the Disconnect button's even available is because of no page refresh). Though I can't replicate that being a problem, I also never see it in "deleting" status. I changed that page to background those commands so it won't sit there for a long time waiting for the return. That will likely be fixed now. If not, please start a new ticket with specifics and we can track down what's happening there.

Actions #31

Updated by Greg M over 8 years ago

Hi Chris!
You are correct. Now with your additional fix to background the commands everything is really smooth, thanks!

Actions #32

Updated by Phillip Davis over 8 years ago

I guess that commit https://github.com/pfsense/pfsense/commit/c5d8cbe07c9646f34afebd2610ac34bed090ced0 to master should be cherry-picked to RELENG_2_3 branch, since it is "a good thing".

Actions #33

Updated by Chris Buechler over 8 years ago

Thanks for the follow up, Greg.

Phil: yes, RELENG_2_3 switch over isn't done yet, but when we switch to that we'll make sure master's synced up with it. Thanks for the catch.

Actions #34

Updated by Mitch Claborn over 8 years ago

I'm also experiencing the GUI hang on the IPSec status page on 2.2.6-RELEASE.

If I read the above correct, the 2.3 snapshots have a fix for this problem. Is that accurate?
What is the schedule for a 2.3 release version?

Actions #36

Updated by → luckman212 over 8 years ago

Oh my. These are exciting times.

Actions

Also available in: Atom PDF