Bug #5520
closedIPsec status seems to hang preventing access to the webgui.
Added by Steve Wheeler almost 9 years ago. Updated over 8 years ago.
0%
Description
In some circumstances the IPSec status page 'hangs' and causes access issues to the webgui.
If the IPSec widget is on the dashboard that also prevents accessing the dashboard page which makes it appear as though the entire webgui has stopped as it's the first page after logging in. In many cases though it's possible to access another page by entering the url directly.
From the CLI 'ipsec statusall' does not return anything, also appearing to 'hang'.
Updated by Anonymous almost 9 years ago
- Assignee set to Anonymous
The IPSec status widget is being converted to Ajax which should prevent it from hanging as you describe. I'll update this when complete.
Updated by Steve Wheeler almost 9 years ago
Perhaps I've put this in the wrong place but this is a current issue affecting 2.2.5 and it also affects the CLI tool 'ipsec statusall'. Fixing the widget seems unlikely to help here.
See: ZYZ-44279 and ADI-13616
Updated by Anonymous almost 9 years ago
Understood. But since the widget work was already underway, I mentioned it here. We can reassign the issue for the CLI work if required.
Updated by Anonymous almost 9 years ago
- Assignee deleted (
Anonymous)
IPSec widget has been modified to update the widget via Ajax. This should at least prevent the GUI from hanging.
Updated by Jim Thompson almost 9 years ago
- Assignee set to Chris Buechler
assigned to cmb for evaluation.
Updated by Chris Buechler almost 9 years ago
- Status changed from New to Confirmed
- Priority changed from Normal to High
when this happens, any command that tries to use charon.ctl hangs indefinitely. 'ipsec stop' doesn't even work, have to kill -9 charon for it to stop. Then works fine after starting it again. Not yet sure how to replicate.
Updated by Chris Buechler almost 9 years ago
appears this deadlock might be the source of this issue.
https://wiki.strongswan.org/issues/1185
Updated by Renato Botelho almost 9 years ago
- Status changed from Confirmed to Feedback
Patch imported from strongswan and available for 2.2.6 and 2.3 snapshots
Updated by Chris Buechler almost 9 years ago
- Subject changed from IPSec status seems to hang preventing access to the webgui. to IPsec status seems to hang preventing access to the webgui.
- Status changed from Feedback to Confirmed
that patch didn't resolve the issue on one problem system. followed up on that strongswan bug ticket to see if we can get further details, also built strongswan pkg with -g and without strip for testing
Updated by Brice Figureau almost 9 years ago
I upgraded to 2.2.6 this morning, but I still see the same issue: ipsec tunnels hanging relatively frequently (several times per day), then if you kill -9 charon
it clears the situation. When charon is frozen, it's not possible to issue ipsec status
, it also freezes. At first I thought it would have been related to some users using the road-warrior tunnel, but I'm not sure.
What can I do to help troubleshoot the issue?
Updated by Jim Thompson almost 9 years ago
Please investigate https://wiki.strongswan.org/issues/1269
May not apply if original support customer isn't using IKEv1 (just checked, at least one of them is.)
Updated by Brice Figureau almost 9 years ago
Jim Thompson wrote:
Please investigate https://wiki.strongswan.org/issues/1269
May not apply if original support customer isn't using IKEv1 (just checked, at least one of them is.)
I'm using only IKEv1.
Updated by Chris Buechler almost 9 years ago
Brice: there is a patch for strongswan issue 1269 in 2.2.7 (https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/updates/) and 2.3 snapshots. I upgraded one customer system that was hitting the issue at least every few days and sometimes more often. It hasn't happened since, though it hasn't been long enough to know for sure whether that had any impact. Since you can replicate it much more easily, if you can try one of those and report back it'd be appreciated.
Updated by bjoern karges almost 9 years ago
Steve Wheeler wrote:
In some circumstances the IPSec status page 'hangs' and causes access issues to the webgui.
If the IPSec widget is on the dashboard that also prevents accessing the dashboard page which makes it appear as though the entire webgui has stopped as it's the first page after logging in. In many cases though it's possible to access another page by entering the url directly.
From the CLI 'ipsec statusall' does not return anything, also appearing to 'hang'.
I have more or less the same issue here.
We are running 2.2.6-RELEASE (amd64) on a SG-4860. If I connect with Shrew on a mobile client, and access the IPSec status page on the firewall, there is a possibility that the whole webinterface hangs up. All other functions are working fine, but the webinterface and the second site2site VPN are crashing.
The solution for me is, go to the serverroom, connect via USB at the console port, restart PHP Interface, and afterwards restart IPSec, and all is working fine again. Also my users, the ones who doesnt need the site2site or mobile clients VPN, can work without any knowledge of this problem. everything else is working fine in the background.
Btw. also my IPSec mobile clients VPN is working, after a reconnect on it, fine again.
I hope i exlained and descriped everything good enough. If you have further questions, please contact me. I thought maybe my infos are helping to find the issue.
Thx4all
Bjoern
Updated by Chris Buechler almost 9 years ago
The change from strongswan issue 1269 has no impact on this issue.
Updated by Jim Thompson almost 9 years ago
- Assignee changed from Chris Buechler to Marc Dye
Updated by Chris Buechler almost 9 years ago
The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.
One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.
Updated by Jim Thompson over 8 years ago
We're now at 45 hours since that post, so a total of 97 hours of run time, assuming that it hasn't hung yet.
Last time we tried, it hung at 4 days (~~96 hours).
So sometime Tuesday morning, if it's still not hung, I'm going to call it "fixed".
Updated by Jim Thompson over 8 years ago
Jim Thompson wrote:
We're now at 45 hours since that post, so a total of 97 hours of run time, assuming that it hasn't hung yet.
Last time we tried, it hung at 4 days (~~96 hours).
So sometime Tuesday morning (when it's daylight), if it's still not hung, I'm going to call it "fixed".
Updated by Greg M over 8 years ago
Hey Jim!
You can call it fixed, it didn`t happen on my 2 systems too.
Updated by Brice Figureau over 8 years ago
Chris Buechler wrote:
The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.
One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.
Is this fix available in a 2.2 snapshot or only in a 2.3 snapshot?
Is there an ETA for this fix to land in a stable release (2.2.7 or 2.3)?
Updated by Renato Botelho over 8 years ago
Brice Figureau wrote:
Chris Buechler wrote:
The change from strongswan's 1185-vici-action-unlock branch was brought in this week in hopes of addressing this.
One system that usually hits the problem within 24-48 hours is now at 52 hours uptime with this change, and it hasn't occurred (yet, at least). Good sign, but will take another ~3 days to have a high degree of confidence it's been resolved.
Is this fix available in a 2.2 snapshot or only in a 2.3 snapshot?
Is there an ETA for this fix to land in a stable release (2.2.7 or 2.3)?
You can find 2.2.7 snapshots with this fix at:
https://snapshots.pfsense.org/FreeBSD_releng/10.1/amd64/pfSense_RELENG_2_2/
https://snapshots.pfsense.org/FreeBSD_releng/10.1/i386/pfSense_RELENG_2_2/
And it's also applied in 2.3 snapshots
Updated by Greg M over 8 years ago
Hello!
On my 2.3 latest snap it happened again today.
Had to kill php and restart webconfigurator to gain access to the gui.
Updated by Brad Benton over 8 years ago
I'm on 2.2.6-RELEASE (amd64) and experiencing this issue. Is there a patch I can apply? I tried the ones listed here but they would not test -- I suspect due to version.
Thanks!
Updated by Renato Botelho over 8 years ago
- Status changed from Confirmed to Feedback
Updated by Chris Buechler over 8 years ago
The deadlock in strongswan is confirmed fixed with 2.3 on one production system that was hitting it frequently. There were other status hangs that didn't deadlock strongswan, but none of which we have a replicable case for.
If you're seeing issues here, now's the time to upgrade to 2.3. If you can still replicate problems on latest 2.3 snapshot, I want to work with you ASAP to track down the cause.
Updated by Greg M over 8 years ago
Hi!
It still happens for me, I can`t always replicate. restarting php brings webgui back.
I have 2 pfsenses.
What I did:
1. Upgrade pfsense A to latest snap.
2. On pfsense B, go to status->Ipsec and disconnect VPN, after I confirmed all dialogues system freezed.
3. Restarted php on pfsense B and refreshed ipsec status which was displaying DELETING status of P1
4. Clicked on connect VPN and back in business.
Now I don`t know if this can be replicated this way I can`t try it because these are prod systems.
Firewall B is:
2.3-BETA (amd64)
built on Sun Mar 20 00:15:33 CDT 2016
FreeBSD 10.3-RC3
BR,
Greg
Updated by Greg M over 8 years ago
I can replicate this everytime I do step 1 and 2 from above post.
So this is definitly not fixed.
Updated by Greg M over 8 years ago
New way to repolicate.
Establish ipsec.
Reboot box A and wait 15 seconds (needs to be 100% down).
Now on box B disconnect IPsec via GUI = system frozen.
IMHO this happens everytime one of connected parties is down and ipsec just waits and waits and never timeouts.
Updated by Chris Buechler over 8 years ago
- Status changed from Feedback to Resolved
Deadlock confirmed fixed on another system this week by upgrading to 2.3. That's the issue this ticket started with, and it's definitely fixed.
Greg: you're hitting some different, unrelated to everything else discussed here. In your case, it sounds like the 'ipsec down' command takes a long time to return. Figured it was maybe trying to disconnect a connection that isn't actually there anymore (as the mentioned reboot scenario will have a DELETE sent, so it'll be down and gone when you're hitting Disconnect, the only reason the Disconnect button's even available is because of no page refresh). Though I can't replicate that being a problem, I also never see it in "deleting" status. I changed that page to background those commands so it won't sit there for a long time waiting for the return. That will likely be fixed now. If not, please start a new ticket with specifics and we can track down what's happening there.
Updated by Greg M over 8 years ago
Hi Chris!
You are correct. Now with your additional fix to background the commands everything is really smooth, thanks!
Updated by Phillip Davis over 8 years ago
I guess that commit https://github.com/pfsense/pfsense/commit/c5d8cbe07c9646f34afebd2610ac34bed090ced0 to master should be cherry-picked to RELENG_2_3 branch, since it is "a good thing".
Updated by Chris Buechler over 8 years ago
Thanks for the follow up, Greg.
Phil: yes, RELENG_2_3 switch over isn't done yet, but when we switch to that we'll make sure master's synced up with it. Thanks for the catch.
Updated by Mitch Claborn over 8 years ago
I'm also experiencing the GUI hang on the IPSec status page on 2.2.6-RELEASE.
If I read the above correct, the 2.3 snapshots have a fix for this problem. Is that accurate?
What is the schedule for a 2.3 release version?