https://redmine.pfsense.org/https://redmine.pfsense.org/favicon.ico?16780521162017-01-27T09:14:33ZpfSense bugtrackerpfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=309362017-01-27T09:14:33ZConstantine Kormashev
<ul><li><strong>File</strong> <a href="/attachments/1961">citrix.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1961/citrix.pcap">citrix.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1959">dns.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1959/dns.pcap">dns.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1960">exchange.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1960/exchange.pcap">exchange.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1962">http_browsing.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1962/http_browsing.pcap">http_browsing.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1963">http_get.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1963/http_get.pcap">http_get.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1964">https.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1964/https.pcap">https.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1966">mail_pop.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1966/mail_pop.pcap">mail_pop.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1965">Oracle.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1965/Oracle.pcap">Oracle.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1968">rtp_160k.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1968/rtp_160k.pcap">rtp_160k.pcap</a> added</li><li><strong>File</strong> <a href="/attachments/1967">rtp_250k_rtp_only_1.pcap</a> <a class="icon-only icon-download" title="Download" href="/attachments/download/1967/rtp_250k_rtp_only_1.pcap">rtp_250k_rtp_only_1.pcap</a> added</li></ul><p>I can reproduce this bug.<br />It happens when I use especial traffic pattern for cisco t-rex which included several pcaps with real traffic:</p>
<p>Oracle.pcap <br />Video_Calls.pcap <br />rtp_160k.pcap <br />rtp_250k_rtp_only_1.pcap <br />rtp_250k_rtp_only_2.pcap <br />smtp.pcap <br />Voice_calls_rtp_only.pcap <br />citrix.pcap <br />dns.pcap <br />exchange.pcap <br />http_browsing.pcap <br />http_get.pcap <br />http_post.pcap <br />https.pcap <br />mail_pop.pcap</p>
<p>And I noticed that just simple diffetent size UDP packets even huge number of them can not reproduce this error.<br />For traffic pattern I wrote above than more it volume that faster bug happens.</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=309452017-01-28T07:16:18ZConstantine Kormashev
<ul></ul><p>Adding "hw.igb.num_queues=1" to /boot/local.conf helps resolving this issue.<br /><em>sysctl hw.igb.num_queues<br />hw.igb.num_queues: 1</em></p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=309782017-01-30T00:21:32ZLuiz Souzaluiz@netgate.com
<ul></ul><p>Seems like a know bug in FreeBSD (or sort of): <a class="external" href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208409#c11">https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208409#c11</a></p>
<p>Also duplicated with <a class="issue tracker-1 status-3 priority-5 priority-high4 closed" title="Bug: igb driver queue related crashes (Resolved)" href="https://redmine.pfsense.org/issues/7149">#7149</a> (I'll keep both open for now as they have different information about the bug).</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=309792017-01-30T00:28:45ZLuiz Souzaluiz@netgate.com
<ul></ul><p>The FreeBSD PR also suggest that disabling the LEGACY_TX support (and ALTQ support altogether) would also fix the crashes.</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=310002017-01-30T20:30:02ZLuiz Souzaluiz@netgate.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Feedback</i></li></ul><p>This commit fix a few obvious issues in igb: <a class="external" href="https://github.com/pfsense/FreeBSD-src/commit/215ddb035593bc4cee275b9dbbf8fc3a7579aee1">https://github.com/pfsense/FreeBSD-src/commit/215ddb035593bc4cee275b9dbbf8fc3a7579aee1</a></p>
<p>Please update and test for regressions.</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=310042017-01-31T03:46:22ZVladimir Lind
<ul></ul><p>Tests repeated as instructed by Constantine - SG4860 did not crash with 2.4 built on Mon Jan 30 22:08:41 CST 2017</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=310542017-02-02T00:55:38ZConstantine Kormashev
<ul></ul><p>I noticed with new firmware SG4860 uses CPU resources <strong>on 25% more</strong> than on previous version.<br />Now it is 185% CPU IDLE but earlier it was 305% CPU IDLE. Interruptions get 212% per 1G/s flow</p>
<p>Typical top -CSP for new firmware:</p>
<p>last pid: 13314; load averages: 2.12, 2.18, 1.85 up 1+23:27:59 06:55:00<br />70 processes: 2 running, 65 sleeping, 2 zombie, 1 waiting<br />CPU 0: 0.0% user, 0.0% nice, 0.0% system, 80.9% interrupt, 19.1% idle<br />CPU 1: 0.0% user, 0.0% nice, 0.0% system, 82.7% interrupt, 17.3% idle<br />CPU 2: 0.0% user, 0.0% nice, 0.6% system, 29.0% interrupt, 70.4% idle<br />CPU 3: 0.0% user, 0.0% nice, 0.6% system, 19.8% interrupt, 79.6% idle<br />Mem: 7600K Active, 103M Inact, 260M Wired, 26M Buf, 7511M Free<br />Swap: 1459M Total, 1459M Free</p>
<pre><code>PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND<br /> 12 root 45 -64 - 0K 720K WAIT -1 333:05 <strong>211.69% intr</strong><br /> 11 root 4 155 ki31 0K 64K RUN 0 183.0H <strong>186.44% idle</strong><br /> 7 root 1 -16 - 0K 16K - 2 1:55 0.80% rand_harv<br /> 0 root 37 -16 - 0K 592K swapin 3 10:56 0.62% kernel<br />57695 root 1 20 0 20012K 3320K CPU3 3 0:02 0.18% top<br /> 6 root 1 -16 - 0K 16K pftm 2 1:05 0.15% pf purge<br /> 15 root 5 -68 - 0K 80K - 0 0:19 0.03% usb<br /> 4 root 2 -16 - 0K 32K - 2 0:05 0.01% cam<br />18020 root 1 20 0 37616K 8032K kqread 1 0:07 0.01% nginx<br /> 24 root 1 16 - 0K 16K syncer 3 0:07 0.01% syncer<br /> 22 root 2 -16 - 0K 32K psleep 2 0:03 0.01% bufdaemon<br />18743 root 2 20 0 29120K 12888K select 2 0:16 0.01% ntpd<br />18284 root 1 20 0 12468K 2348K nanslp 3 0:00 0.01% cron</code></pre> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=310552017-02-02T01:16:02ZConstantine Kormashev
<ul></ul><p>Constantine Kormashev wrote:</p>
<blockquote>
<p>I noticed with new firmware SG4860 uses CPU resources <strong>on 25% more</strong> than on previous version.<br />Now it is 185% CPU IDLE but earlier it was 305% CPU IDLE. Interruptions get 212% per 1G/s flow</p>
</blockquote>
<p>And thee is same picture for other traffic types: 290% IDLE instead 360% IDLE for 1518b frames</p>
<p>And huge performance degradation for small size frames (64b) 105000pps instead 198000pps and for random frame sizes (64-1518) 110000 pps instead 133000 pps</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=311322017-02-06T00:31:43ZLuiz Souzaluiz@netgate.com
<ul></ul><p>This may be the tradeoff of the fix, in reality won't disable the multiple queues but only one is going to be used and because of that there are cases where you have to drop the locks in one CPU acquire the lock in another CPU, this has a price...</p>
<p>I'll check the fix with the FreeBSD, maybe someone come up with a better fix.</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=315062017-02-13T10:28:20ZLuiz Souzaluiz@netgate.com
<ul></ul><p>The next build has a different fix for this issue, it probably has better performance too.</p>
<p>Could you, please, check what is the degradation, if any, of this new fix ?</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=315722017-02-17T05:39:38ZConstantine Kormashev
<ul></ul><p>I updated 4860 on last firmware and made tests. And I got very good result.<br />There is not problem with performance and I could not reproduce issue which led to kernel panic.<br />I tested device during several hours and did not notice any troubles.</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=316012017-02-17T23:11:25ZLuiz Souzaluiz@netgate.com
<ul></ul><p>Thank you again Constantine!</p>
<p>I'll upstream this fix.</p> pfSense - Bug #7166: During bandwidth test 4860 with 2.4 got Fatal trap 12: page fault while in kernel modehttps://redmine.pfsense.org/issues/7166?journal_id=316052017-02-17T23:19:39ZLuiz Souzaluiz@netgate.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>Fixed.</p>