https://redmine.pfsense.org/https://redmine.pfsense.org/favicon.ico?16780521162022-04-01T11:03:23ZpfSense bugtrackerpfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=601512022-04-01T11:03:23ZJim Pingle
<ul></ul><p>Might be the same root cause as <a class="issue tracker-1 status-1 priority-4 priority-default" title="Bug: ipsec status freezing (New)" href="https://redmine.pfsense.org/issues/7420">#7420</a> though we don't have enough information about either one of these to say for certain. The symptoms are very similar, though.</p>
<p>I can't replicate this on demand, I've seen it happen maybe once or twice ever when working on IPsec code. There are a couple reports on the forum and reddit as well.</p>
<p>As far as I know, though, nobody can induce it reliably.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=601532022-04-01T11:04:45ZJim Pingle
<ul><li><strong>Related to</strong> <i><a class="issue tracker-1 status-1 priority-4 priority-default" href="/issues/7420">Bug #7420</a>: ipsec status freezing</i> added</li></ul> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=603052022-04-10T17:52:19ZPierre-Emmanuel DEGRYSE
<ul></ul><p>Hi. I get the same error.</p>
<p>See below the IPSecs logs with the highest verbosity level:</p>
<pre>
Apr 11 09:48:49 charon 23603 07[JOB] watcher going to poll() 6 fds
Apr 11 09:48:49 charon 23603 07[JOB] watching 24 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 23 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 18 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 13 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 8 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watcher got notification, rebuilding
Apr 11 09:48:49 charon 23603 03[CFG] vici client 35 disconnected
Apr 11 09:48:49 charon 23603 07[JOB] watcher going to poll() 6 fds
Apr 11 09:48:49 charon 23603 07[JOB] watching 24 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 23 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 18 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 13 for reading
Apr 11 09:48:49 charon 23603 07[JOB] watching 8 for reading
</pre>
<p>Edit: removed redundant info.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=604022022-04-15T20:29:25ZKris Phillips
<ul></ul><p>Someone with this issue:</p>
<p>If you could please run:</p>
<p>ps aux | grep charon<br />Output should look something like this, with the bold number being important: root <strong>35176</strong> 0.0 0.2 68960 19560 - I Thu21 0:06.64 /usr/local/libexec/ipsec/charon --use-syslog</p>
<p>ktrace -p [the PID of the charon process here from the above command]</p>
<p>kdump</p>
<p>Then please provide the output here in a redmine response. That will be most helpful in investigating this issue and resolving it.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=608812022-05-04T03:29:47ZTobias Ock
<ul><li><strong>File</strong> <a href="/attachments/4201">kdump.JPG</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4201/kdump.JPG">kdump.JPG</a> added</li></ul><p>Hi,</p>
<p>after updating to pfSense Plus 22.01 on XG-7100 I get this issue too.<br />As a side note... We also changed the ports from 1Gb to 10Gb in this step</p>
<p>Unfortunately, kdump doesn't give me any output</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=608982022-05-04T14:30:08ZKris Phillips
<ul></ul><p>FYI, this seems to help: if you go to System --> Advanced --> System Tunables and change kern.ipc.soacceptqueue to at least 512, then reboot, it seems to abate whatever condition is causing this.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=613832022-05-24T16:50:11ZBrad Davis
<ul><li><strong>Assignee</strong> set to <i>Mateusz Guzik</i></li><li><strong>Target version</strong> set to <i>2.7.0</i></li><li><strong>Plus Target Version</strong> set to <i>22.05</i></li></ul><p>We think this is fixed, but need additional testing to know for sure.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=613842022-05-24T17:04:28ZMateusz Guzik
<ul></ul><p>No, this is not fixed. However, chances are excellent this is an old & known bug: use-after-free in key-related state in ipsec. I have a patch to fix that bit, but it was not committed yet as running with it runs into another bug.</p>
<p>The idea will be to boot a kernel with custom debug which will either confirm the suspicion is correct OR give a starting point for investigation if not.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=613852022-05-24T17:05:03ZJim Pingle
<ul><li><strong>Plus Target Version</strong> changed from <i>22.05</i> to <i>22.09</i></li></ul> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=619772022-06-28T12:01:09ZJim Pingle
<ul><li><strong>Plus Target Version</strong> changed from <i>22.09</i> to <i>22.11</i></li></ul> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=621112022-07-07T13:44:57ZChris W
<ul></ul><p>We suggested this bug may be the cause of what the customer is seeing in 945855019. His experience is that the tunnels are down and not passing traffic, however, and the only thing which restores those connections is a reboot.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=621502022-07-11T07:48:42ZJesse Ortizjortiz@itmsolutions.us
<ul></ul><p>Hello, I have been working with technical support on this issue and was told to upgrade to version Pfsense Plus 22.05 version and the issue persisted.</p>
<p>There is a community post with more details than this bug tracker to help resolve the issue.<br /><a class="external" href="https://forum.netgate.com/topic/172075/my-ipsec-service-hangs/34">https://forum.netgate.com/topic/172075/my-ipsec-service-hangs/34</a></p>
<p>Hope the comments and details provided on this post help get it fixed.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=624602022-08-04T20:54:41ZGassy Antelope
<ul><li><strong>File</strong> <a href="/attachments/4375">charon_crash_ktrace.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4375/charon_crash_ktrace.txt">charon_crash_ktrace.txt</a> added</li></ul><p>Here's a kernel trace that shows what occurs when it crashes. I know the previous dump someone posted didn't show anything, because it was done after it crashed. I had the trace going the whole time until it crashed, ending up with a 2GB file. The file I've included is just the last 100ish lines from it (IPs redacted). Hopefully we can get this fixed finally.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=624632022-08-05T13:25:12ZKris Phillips
<ul></ul><p>FYI I had a customer who had a box working fine for years, but it had some slow performance due to high CPU usage. Upon enabling AES-NI 24 hours ago for IPSec the box had this issue crop up. Might be totally unrelated as I had no way of verifying in the moment that AES-NI was the culprit, but I'd be curious if someone with this issue could test with AES-NI disabled to see if it magically eliminates their problem.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=624652022-08-05T14:14:42ZGassy Antelope
<ul></ul><p>Interesting, I'll go ahead and disable AES-NI and see what happens.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=624972022-08-08T16:29:45ZGassy Antelope
<ul></ul><p>It doesn't appear to be related to AES-NI. Had the issue happen a couple times with AES-NI disabled.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=627732022-09-03T18:54:58ZDavid Vazquez
<ul></ul><p>I've been having the same issue as everyone above so I wrote a script to restart the necessary services when the problem occurs. I run this every minute via cron job. It's quick and dirty but it gets the job done. Hope this helps!<br /><pre><code class="shell syntaxhl"><span class="c">#!/bin/sh</span>
<span class="nv">queueLength</span><span class="o">=</span><span class="si">$(</span>netstat <span class="nt">-Lan</span> | <span class="nb">grep </span>charon.vici | <span class="nb">cut</span> <span class="nt">-c</span> 7<span class="si">)</span>
<span class="k">if</span> <span class="o">[</span> <span class="k">$((</span>queueLength<span class="k">))</span> <span class="nt">-gt</span> 0 <span class="o">]</span><span class="p">;</span> <span class="k">then</span>
/usr/bin/killall <span class="nt">-9</span> charon
/usr/local/sbin/pfSsh.php playback restartipsec<span class="p">;</span> <span class="nb">sleep </span>10<span class="p">;</span> /usr/local/sbin/pfSsh.php playback restartipsec
<span class="k">else
fi</span>
</code></pre></p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=630832022-10-07T11:02:49ZMikael Karlsson
<ul></ul><p>David Vazquez wrote in <a href="#note-17">#note-17</a>:</p>
<blockquote>
<p>I've been having the same issue as everyone above so I wrote a script to restart the necessary services when the problem occurs. I run this every minute via cron job. It's quick and dirty but it gets the job done. Hope this helps!<br />[...]</p>
</blockquote>
<p>Can confirm the same bug on a Netgate SG-4860 running 22.01, the fix above works really well. Thank you!</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=631482022-10-11T14:40:56ZJim Pingle
<ul><li><strong>Plus Target Version</strong> changed from <i>22.11</i> to <i>23.01</i></li></ul> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=632592022-10-11T15:02:13ZGassy Antelope
<ul></ul><p>Is there any idea when this issue may get fixed? It keeps being endlessly pushed back to the next version. The forum thread has more and more people stating that they are experiencing the problem. I'd consider this a high priority fix, since IPsec VPN's are a basic feature in firewalls and they break multiple times a day in pfSense. I get emails from Netgate about them providing "high-performance firewall, <strong>VPN</strong>, and routing solutions," "most trusted firewall," and "best meets requirements," yet it seems to be the only firewall I've ever seen where IPsec VPN's don't work properly. It's kind of embarassing to be bragging about that type of stuff when issues like this exists for 6+ months and are ignored.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=632602022-10-11T15:10:23ZJim Pingle
<ul></ul><p>It didn't get pushed back to the next version, there won't be a 22.11 as there is still a significant amount of work to be done and not enough time to get it all well tested before November.</p>
<p>The current hope is to fix this for the next release, and that next release will be 23.01.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=638392022-11-16T11:54:41ZKris Phillips
<ul></ul><p>EDIT:</p>
<p>Disregard this. Did not permanently resolve the issue, but only seemed to help slow it down.</p>
<p>ORIGINAL:</p>
<p>Another possible dead-end, but maybe not:</p>
<p>I had a customer today who ran into this issue. I discovered that logging files with bzip compression was causing charon.vici to hang up for some reason. Setting compression to None, resetting the log files, and then running "/usr/bin/killall -9 charon" + an IPsec service restart fixed the issue. They were running into this within 2-3 minutes after rebooting and had the same symptoms. Afterwards they were fine for 10 minutes straight.</p>
<p>Again, this may entirely be a rabbit trail and could prove to be unrelated or irrelevant, but anytime I find a possible "root cause" I will try to continue documenting it here. It's also possible that sources of this problem are varied and the charon process can get hung up for a multitude of reasons.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=638572022-11-17T08:42:31ZKristof Provost
<ul></ul><p>Based on available information the suspicion is that charon itself is deadlocking, which matches the described symptoms (no vici interaction, traffic keeps flowing) and the kdump.<br />Unfortunately there are a lot of locks in the charon code and it's not clear where this deadlock might be happening.</p>
<p>Ideally we'd need to be able to reproduce this. Until it's figured out how there are a couple of things we can try that may provide hints.</p>
<p>The first would be to run `ipsec statusall` on an affected but currently correctly working machine, as well as dumping the contents of /var/etc/ipsec/strongswan.conf and /var/etc/ipsec/swanctl.conf<br />Then, on a machine in the bad state, `procstat -t <pid>`, `procstat -f <pid>` and `procstat -k <pid>` on the "daemon: /usr/local/libexec/ipsec/charon" process.</p>
<p>The ideal situation would be that we were able to reproduce this so that we could investigate an affected charon process with gdb, but until then we'll have to gather whatever small clues we can find.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=639222022-11-18T14:04:21ZDavid Vazquez
<ul><li><strong>File</strong> <a href="/attachments/4526">procstat_on_failed_charon.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4526/procstat_on_failed_charon.txt">procstat_on_failed_charon.txt</a> added</li><li><strong>File</strong> <a href="/attachments/4527">ipsec_status_all.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4527/ipsec_status_all.txt">ipsec_status_all.txt</a> added</li><li><strong>File</strong> <a href="/attachments/4528">swanctl.conf</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4528/swanctl.conf">swanctl.conf</a> added</li><li><strong>File</strong> <a href="/attachments/4529">strongswan.conf</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4529/strongswan.conf">strongswan.conf</a> added</li></ul><p>Kristof Provost wrote in <a href="#note-23">#note-23</a>:</p>
<blockquote>
<p>Based on available information the suspicion is that charon itself is deadlocking, which matches the described symptoms (no vici interaction, traffic keeps flowing) and the kdump.<br />Unfortunately there are a lot of locks in the charon code and it's not clear where this deadlock might be happening.</p>
<p>Ideally we'd need to be able to reproduce this. Until it's figured out how there are a couple of things we can try that may provide hints.</p>
<p>The first would be to run `ipsec statusall` on an affected but currently correctly working machine, as well as dumping the contents of /var/etc/ipsec/strongswan.conf and /var/etc/ipsec/swanctl.conf<br />Then, on a machine in the bad state, `procstat -t <pid>`, `procstat -f <pid>` and `procstat -k <pid>` on the "daemon: /usr/local/libexec/ipsec/charon" process.</p>
<p>The ideal situation would be that we were able to reproduce this so that we could investigate an affected charon process with gdb, but until then we'll have to gather whatever small clues we can find.</p>
</blockquote>
<p>Here are the files you requested. 'ipsec statusall' before any issues and then the rest of the commands during the issue. Let me know if I can be of more help. This issue usually happens multiple times per day so I can re-run commands whenever you need.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=639562022-11-21T04:12:51ZKristof Provost
<ul></ul><p>Thanks for that.</p>
<p>There's nothing obviously suspect in the status or configuration files. I do see you have a fair number of ipsec connections set up. Can other affected people comment on their number of connections? It's entirely possible that a larger number of tunnels makes this more likely to happen, and that'd be good to know.</p>
<p>The procstat output also seems to confirm that we're looking at a deadlock in charon. I see threads waiting for a read lock on a read/write lock, other threads waiting for a write lock and others waiting for a mutex. Here too it would be interesting to have more samples.</p>
<p>Short version: more procstat output and reports of number of ipsec connections on affected instances. There's no need for more ipsec statusall or configuration files.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=640542022-11-26T19:27:42ZKris Phillips
<ul></ul><p>Kristof Provost wrote in <a href="#note-25">#note-25</a>:</p>
<blockquote>
<p>Thanks for that.</p>
<p>There's nothing obviously suspect in the status or configuration files. I do see you have a fair number of ipsec connections set up. Can other affected people comment on their number of connections? It's entirely possible that a larger number of tunnels makes this more likely to happen, and that'd be good to know.</p>
<p>The procstat output also seems to confirm that we're looking at a deadlock in charon. I see threads waiting for a read lock on a read/write lock, other threads waiting for a write lock and others waiting for a mutex. Here too it would be interesting to have more samples.</p>
<p>Short version: more procstat output and reports of number of ipsec connections on affected instances. There's no need for more ipsec statusall or configuration files.</p>
</blockquote>
<p>Hello Kristof,</p>
<p>While not always the case, in about 90%+ of cases I've seen with this issue there are at least 15-20 tunnels, if not 50+ involved. I have saw this issue with as low as 4-5, but it's significantly more common with more tunnels.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=642492022-12-06T06:19:09ZMikael Karlsson
<ul></ul><p>Kris Phillips wrote in <a href="#note-26">#note-26</a>:</p>
<blockquote>
<p>Kristof Provost wrote in <a href="#note-25">#note-25</a>:</p>
<blockquote>
<p>Thanks for that.</p>
<p>There's nothing obviously suspect in the status or configuration files. I do see you have a fair number of ipsec connections set up. Can other affected people comment on their number of connections? It's entirely possible that a larger number of tunnels makes this more likely to happen, and that'd be good to know.</p>
<p>The procstat output also seems to confirm that we're looking at a deadlock in charon. I see threads waiting for a read lock on a read/write lock, other threads waiting for a write lock and others waiting for a mutex. Here too it would be interesting to have more samples.</p>
<p>Short version: more procstat output and reports of number of ipsec connections on affected instances. There's no need for more ipsec statusall or configuration files.</p>
</blockquote>
<p>Hello Kristof,</p>
<p>While not always the case, in about 90%+ of cases I've seen with this issue there are at least 15-20 tunnels, if not 50+ involved. I have saw this issue with as low as 4-5, but it's significantly more common with more tunnels.</p>
</blockquote>
<p>Regarding the number of tunnels, we have 14 phase 1 entries and a fairly large number of phase 2 entries (80+), most of the tunnels are disconnected most of the time, but at normally there are 1-5 phase 1 connections with a total of 15-20 phase 2 connections that are active. The crash happens roughly once every 12-24h.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=642732022-12-06T17:17:40ZMarcos M
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/64273/diff?detail_id=54518">diff</a>)</li></ul> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=643832022-12-07T23:27:59ZDan Bailey
<ul></ul><p>We have tried everything based on....</p>
<p><a class="external" href="https://forum.netgate.com/topic/172075/my-ipsec-service-hangs/6">https://forum.netgate.com/topic/172075/my-ipsec-service-hangs/6</a></p>
<p>We now have over 50 IPsec tunnels (50 P1, ~150 P2) and through trial and error have some theorys to what causes loss of all IPsec (and the management UI)</p>
<p>Often we have to spend days working with a customer to get a VPN working. If we cannot establish due to a mismatch and leave the IPsec 'trying' to establish, that's when it brings everything crashing down. Even after the VPN is established if we fail to spot a VPN down we can have same issue. We have had loss of IPsec / man UI several times in a single day. Now if a VPN goes down or is incomplete we disable it till the customer side is ready which does help somewhat.</p>
<p>When IPsec fails it usually take man UI with it, though NAT etc still functional.</p>
<p>Things we have tried -</p>
<p>Reducing IPsec log chatter in log settings<br />Running cron job to delete log archives so they never 'roll over' (probably a red herring)<br />Hotfix provided by tech support based on this bug (though this was only for a IPsec UI bug afaik)<br />Providing detailed logs to tech support</p>
<p>Our latest attempt we are trying is to disable all disk based logging on FW, and instead rely on a external syslogger.</p>
<p>Will post back results over time.</p>
<p>It is 100% not AES-NI as we have have same issues on or off.</p>
<p>The longest we have had FW running without issues is 45days, even this is unacceptable.</p>
<p>My only advice for others for now is don't leave IPsec VPNs 'trying' to connect.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=643852022-12-08T07:24:52ZKristof Provost
<ul></ul><p>I've tried running charon under valgrind's helgrind and drd tools. The idea was to identify any lock misuse or lock order reversals that could produce a deadlock, but unfortunately nothing like that turned up.<br />I've also had no luck reproducing it, which would have allowed attaching with gdb for inspection and might have yielded more clues.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=643912022-12-08T10:07:55ZDavid Vazquez
<ul></ul><p>Kristof Provost wrote in <a href="#note-30">#note-30</a>:</p>
<blockquote>
<p>I've tried running charon under valgrind's helgrind and drd tools. The idea was to identify any lock misuse or lock order reversals that could produce a deadlock, but unfortunately nothing like that turned up.<br />I've also had no luck reproducing it, which would have allowed attaching with gdb for inspection and might have yielded more clues.</p>
</blockquote>
<p>Kristof, is there anything else I can provide for you that would help? The issue usually occurs at least once a day so it wouldn't be long before I would have the system in a failed state.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=643932022-12-08T10:23:29ZKristof Provost
<ul></ul><p>A way to reproduce it reliably, but I appreciate that that's not easy (I've been trying to get one for two days, after all!).</p>
<p>Absent that we could build a strongswan package with debug symbols included for you to run so that we can attach gdb when it's in the bad state. Unfortunately I'm heading out on an extended vacation myself, so I won't be able to do that soon (or follow up on it).<br />I've left my notes here in part so that you'd know this is being worked on, and for my colleagues to know what I've tried so far.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=644332022-12-09T08:57:34ZKristof Provost
<ul></ul><p>I've built strongswan packages for 22.05 (should also work on 2.6.0) and 23.01:<br /><a class="external" href="https://people.freebsd.org/~kp/strongswan-5.9.5.pkg">https://people.freebsd.org/~kp/strongswan-5.9.5.pkg</a> (22.05)<br /><a class="external" href="https://people.freebsd.org/~kp/strongswan-5.9.8.pkg">https://people.freebsd.org/~kp/strongswan-5.9.8.pkg</a> (23.01)</p>
<p>To be clear: all these do is add debug information. They are not expected to fix the issue.</p>
<p>To test with them: </p>
<pre><code>- back up your configuration<br /> - install gdb (pkg install gdb)<br /> - copy the relevant package to the device<br /> - pkg add -f strongswan-5.9.5.pkg <br /> - reboot</code></pre>
<p>Wait for the problem to recur, when it does log in again, find the pid of `/usr/local/libexec/ipsec/charon --use-syslog` and run gdb: /usr/local/bin/gdb -p <pid><br />In gdb the immediate thing to look at is the backtrace for all threads. Use `thread apply all bt`.<br />That'll produce a number (~17) of backtraces. The hope is that those will give us a hint as to how the deadlock happens.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=644932022-12-11T11:39:30ZRafał Kaźmierowski
<ul></ul><p>we have new developers for this topic Hi Mateusz. <br />I Have this same issue in my configuration on production. One or 3 times a Week my Ipsec frezze on P2.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=644942022-12-11T11:43:18ZDavid Vazquez
<ul><li><strong>File</strong> <a href="/attachments/4576">gdb_deadlocked_charon.txt</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/4576/gdb_deadlocked_charon.txt">gdb_deadlocked_charon.txt</a> added</li></ul><p>I am running 2.7.0.a.20221202.0600 on my firewall at the current time so I installed the strongswan package above for 23.01. Here's the output of the above commands. You probably only needed the end bit with the backtraces but I figured I'd include it all for good measure.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=645472022-12-12T21:09:58ZRoman Kazmierczak
<ul></ul><p>I have some 40+ spoke firewalls with new ones deploying weekly. Each FW is initiating 3 IPSec VPNs.<br />While the VPN is connected there is no issue for over a year.<br />However, if the FW is up but it is unable to establish connection due to external connectivity issues, after only few days IPSec service will fail.<br />With this information I've configured our hub FW, so all connections are set as responders only. Interestingly HUB is still attempting to initiate the connection, and with number of unreachable spokes, it is failing and require regular reboot (every 10~24h).</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=645742022-12-14T03:53:12Zjames greenhill
<ul></ul><p>Jim Pingle wrote in <a href="#note-21">#note-21</a>:</p>
<blockquote>
<p>It didn't get pushed back to the next version, there won't be a 22.11 as there is still a significant amount of work to be done and not enough time to get it all well tested before November.</p>
<p>The current hope is to fix this for the next release, and that next release will be 23.01.</p>
</blockquote>
<p>Will a fix for this problem be included in the 23.01 release in the end ?</p>
<p>thanks</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=647512022-12-20T22:37:08ZDan Bailey
<ul></ul><p>Regarding my previous experiment turning off disk logging, we just had IPsec total fail due to just a few p2 of 150+ not being up after a few days</p>
<p>Also not able to log into management UI as usual so just power cycled in AWS</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=648752022-12-23T13:33:08ZDavid Vazquez
<ul></ul><p>After a couple mentions of Phase 2 connections being down, I decided to do a test. On the affected firewall, I had a few tunnels that were down more often than they were up. That's mostly due to primary internet connections being active 99.9% of the time and the backup internet connections not being used. I disabled or removed ALL tunnels that were not actively connected and in use. It's been over 48 hrs and the charon issue has not occurred. Previously, I'd have the issue about 2x a day. Clearly this isn't a solution as I need to have my backup VPNs in place but this might be useful information for the devs to look further into the issue.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=648762022-12-23T13:38:58ZJim Pingle
<ul></ul><p>David Vazquez wrote in <a href="#note-39">#note-39</a>:</p>
<blockquote>
<p>After a couple mentions of Phase 2 connections being down, I decided to do a test. On the affected firewall, I had a few tunnels that were down more often than they were up. That's mostly due to primary internet connections being active 99.9% of the time and the backup internet connections not being used. I disabled or removed ALL tunnels that were not actively connected and in use. It's been over 48 hrs and the charon issue has not occurred. Previously, I'd have the issue about 2x a day. Clearly this isn't a solution as I need to have my backup VPNs in place but this might be useful information for the devs to look further into the issue.</p>
</blockquote>
<p>I have a lot of connections that stay down in my lab for various reasons, but they can connect if needed (on demand or manually) -- and no problems here. What sort of tunnels are these? How exactly are they a "backup"? Are they Tunnel mode or VTI? Are they overlapping/duplicated in some way?</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=648902022-12-27T12:01:19ZDavid Vazquez
<ul></ul><p>Jim Pingle wrote in <a href="#note-40">#note-40</a>:</p>
<blockquote>
<p>I have a lot of connections that stay down in my lab for various reasons, but they can connect if needed (on demand or manually) -- and no problems here. What sort of tunnels are these? How exactly are they a "backup"? Are they Tunnel mode or VTI? Are they overlapping/duplicated in some way?</p>
</blockquote>
<p>I suppose I should have used the word "Failover" vs "Backup". They are in tunnel mode. I have many sites that have dual internet connections. My office does as well. In order to have the connection failover to a secondary internet connection using the given IPs, I have to make (2) Phase 1 entries which means the Phase 2 is duplicated between both Phase 1 entries. From my understanding, pfSense doesn't allow you to add in multiple Gateway IPs in a single Phase 1 entry, so this is how I've accomplished what I'm trying to do.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=649522023-01-03T07:47:26ZJim Pingle
<ul></ul><p>David Vazquez wrote in <a href="#note-41">#note-41</a>:</p>
<blockquote>
<p>Jim Pingle wrote in <a href="#note-40">#note-40</a>:</p>
<blockquote>
<p>I have a lot of connections that stay down in my lab for various reasons, but they can connect if needed (on demand or manually) -- and no problems here. What sort of tunnels are these? How exactly are they a "backup"? Are they Tunnel mode or VTI? Are they overlapping/duplicated in some way?</p>
</blockquote>
<p>I suppose I should have used the word "Failover" vs "Backup". They are in tunnel mode. I have many sites that have dual internet connections. My office does as well. In order to have the connection failover to a secondary internet connection using the given IPs, I have to make (2) Phase 1 entries which means the Phase 2 is duplicated between both Phase 1 entries. From my understanding, pfSense doesn't allow you to add in multiple Gateway IPs in a single Phase 1 entry, so this is how I've accomplished what I'm trying to do.</p>
</blockquote>
<p>Having overlapping P2 networks isn't really supported either, and could be a source of problems. I'm not sure if it's relevant here, though. Failover is typically handled by DNS -- Set the remote endpoint to an FQDN and then have the remote update its FQDN if its primary connection fails.</p>
<p>Do you keep the overlapping connections up at all times or do you disable the "backup" entries until they are needed?</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=649572023-01-03T08:02:22ZDavid Vazquez
<ul></ul><p>Jim Pingle wrote in <a href="#note-42">#note-42</a>:</p>
<blockquote>
<p>Having overlapping P2 networks isn't really supported either, and could be a source of problems. I'm not sure if it's relevant here, though. Failover is typically handled by DNS -- Set the remote endpoint to an FQDN and then have the remote update its FQDN if its primary connection fails.</p>
<p>Do you keep the overlapping connections up at all times or do you disable the "backup" entries until they are needed?</p>
</blockquote>
<p>Well, the limitation here by using DNS for failover is that I have to wait for however long it takes for the FQDN to be updated before the VPN comes back up. By using hardcoded IPs, the cut over to the failover VPN is pretty much instantaneous.</p>
<p>As for the backup entries, they are usually enabled at all times. Currently disabled and charon has not had a hard lock since I disabled them.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=649612023-01-03T08:16:39ZJim Pingle
<ul></ul><p>That could be part of the problem, then, because if there are two P2 entries for the same src/dst in the SPD table it may be having issues keeping them straight. It's not valid to have more than one P2 for the same local/remote network. That it works at all is by luck/chance. Though we'll need to try including that in testing to see if it helps reproduce this particular issue.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=649622023-01-03T08:20:15ZRoman Kazmierczak
<ul></ul><p>After disabling keepalives on all responders, the IPsec has been up for past 8days. Before that it would fail every 10-12h.<br />We don't have any traffic towards the remotes which are the initiators.<br />It seems like the issue is caused when the interesting traffic is unable to bring the tunnel up for some time. <br />Separate issue but related in some sense is that configuring the tunnel as a responder only, doesn't prevent interesting traffic from initiating the tunnel. I have confirmed that with packet capture.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=655432023-02-06T14:24:17ZJim Pingle
<ul><li><strong>Plus Target Version</strong> changed from <i>23.01</i> to <i>23.05</i></li></ul><p>We're still trying to reproduce this and gather data on it, but we are getting closer.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=658432023-02-24T01:57:24ZDan Bailey
<ul></ul><p>still happening on pf+ 23.01-RELEASE</p>
<p>if we leave a single VPN trying to connect when other side not configured correctly within hours ipsec completely fails for all and cannot log into UI</p>
<p>we have disk logging off<br />logs only sent to external syslogger<br />most VPNs have split phase 2 connections as connecting to older FWs<br />cannot set to responder only as we usually generate interesting traffic not remote side</p>
<p>around 60-70 IPsec VPNs active, hw crypto enabled (hosted in AWS)</p>
<p>Don't really want to disable keepalives</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=667852023-04-13T11:25:36ZKristof Provost
<ul></ul><p>The nice people at the Strongswan project think they know what the problem is, and have posted an experimental patch.<br />Details in <a class="external" href="https://github.com/strongswan/strongswan/commit/f33cf9376e90f371c9eaa1571f37bd106cbf3ee4">https://github.com/strongswan/strongswan/commit/f33cf9376e90f371c9eaa1571f37bd106cbf3ee4</a></p>
<p>I've built that patch in a package for 23.01. Can someone who's been seeing this issue install this package and confirm (or deny) that it fixes the problem?</p>
<p><a class="external" href="https://people.freebsd.org/~kp/strongswan-5.9.8-test.pkg">https://people.freebsd.org/~kp/strongswan-5.9.8-test.pkg</a></p>
<p>(Don't forget to`pkg add -f strongswan-5.9.8-test.pkg`)</p>
<p>Should the problem not be fixed they'll need backtraces and the configuration files.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=669492023-04-19T08:01:43ZDavid Vazquez
<ul></ul><p>Kristof Provost wrote in <a href="#note-48">#note-48</a>:</p>
<blockquote>
<p>I've built that patch in a package for 23.01. Can someone who's been seeing this issue install this package and confirm (or deny) that it fixes the problem?</p>
</blockquote>
<p>I implemented the potential fix yesterday morning and the issue has not occurred. I'm going to continue monitoring things for a few days and I'll update here next week.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=670522023-04-25T11:54:55ZKristof Provost
<ul></ul><p>Hi David, did you see the issue recur? It'd be very nice to have confirmation so we can land this (and upstream strongswan can too).</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=670602023-04-25T14:37:52ZDavid Vazquez
<ul></ul><p>Hey Kristof, I have not had the issue recur at all. Seems like it's fixed to me! Thank you for keeping up with it.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=670662023-04-26T03:46:32ZKristof Provost
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li></ul><p>I've merged the fix to the relevant branches. It will be present in tomorrow's CE and plus snapshots.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=670702023-04-26T07:24:28ZJim Pingle
<ul><li><strong>Assignee</strong> changed from <i>Mateusz Guzik</i> to <i>Kristof Provost</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=671012023-04-28T15:10:38ZJim Pingle
<ul><li><strong>Subject</strong> changed from <i>Charon.vici can get in a bad state</i> to <i>Deadlock in Charon VICI interface</i></li></ul><p>Updating subject for release notes.</p> pfSense - Bug #13014: Deadlock in Charon VICI interfacehttps://redmine.pfsense.org/issues/13014?journal_id=672502023-05-03T14:29:48ZJim Pingle
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul>