Project

General

Profile

Actions

Regression #14828

open

QAT is not being used by some daemons

Added by Rob A about 1 year ago. Updated 11 months ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
Cryptographic Modules
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Release Notes:
Default
Affected Plus Version:
23.09
Affected Architecture:
4100, 6100

Description

QAT not working. Issue identified on Netgate 6100 and subsequently confirmed on a 4100 unit. Issue confined to 23.09 dev, including latest at time of writing 23.09.a.20231002.0600.

QAT selection on GUI is as normal.

sysctl appears correct:

[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: sysctl -a | grep 'qat'
qat0: <Intel c3xxx QuickAssist> mem 0x81500000-0x8153ffff,0x81540000-0x8157ffff at device 0.0 on pci1
qat0: qat_dev0 started 6 acceleration engines
qat0: FW version: 4.18.0
qat0: Excessive clock measure delay
qat_ocf0: <QAT engine>
irq174: qat0:b0:351 @cpu0(domain0): 0
irq175: qat0:b1:353 @cpu0(domain0): 0
irq176: qat0:b2:355 @cpu0(domain0): 0
irq177: qat0:b3:357 @cpu0(domain0): 0
irq178: qat0:b4:359 @cpu0(domain0): 0
irq179: qat0:b5:361 @cpu0(domain0): 0
irq180: qat0:b6:363 @cpu0(domain0): 0
irq181: qat0:b7:365 @cpu0(domain0): 0
irq182: qat0:b8:367 @cpu0(domain0): 0
irq183: qat0:b9:369 @cpu0(domain0): 0
irq184: qat0:b10:371 @cpu0(domain0): 0
irq185: qat0:b11:373 @cpu0(domain0): 0
irq186: qat0:b12:375 @cpu0(domain0): 0
irq187: qat0:b13:377 @cpu0(domain0): 0
irq188: qat0:b14:379 @cpu0(domain0): 0
irq189: qat0:b15:381 @cpu0(domain0): 0
irq190: qat0:ae:383 @cpu0(domain0): 0
dev.qat_ocf.0.enable: 1
dev.qat_ocf.0.%parent: nexus0
dev.qat_ocf.0.%pnpinfo: 
dev.qat_ocf.0.%location: 
dev.qat_ocf.0.%driver: qat_ocf
dev.qat_ocf.0.%desc: QAT engine
dev.qat_ocf.%parent: 
dev.qat.0.frequency: 685000000
dev.qat.0.cnv_error: 
dev.qat.0.fw_counters: 
dev.qat.0.mmp_version: 6.0.0
dev.qat.0.hw_version: 17
dev.qat.0.fw_version: 4.18.0
dev.qat.0.heartbeat: 1
dev.qat.0.heartbeat_failed: 0
dev.qat.0.heartbeat_sent: 2
dev.qat.0.dev_cfg: [GENERAL]
dev.qat.0.num_user_processes: 0
dev.qat.0.cfg_mode: ks
dev.qat.0.cfg_services: sym;dc
dev.qat.0.state: up
dev.qat.0.%parent: pci1
dev.qat.0.%pnpinfo: vendor=0x8086 device=0x19e2 subvendor=0x8086 subdevice=0x19e2 class=0x0b4000
dev.qat.0.%location: slot=0 function=0 dbsf=pci0:1:0:0 handle=\_SB_.PCI0.VRP2.PXSX
dev.qat.0.%driver: qat
dev.qat.0.%desc: Intel c3xxx QuickAssist
dev.qat.%parent: 
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: 

Kernel looks ok:

[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:  kldstat -v | grep qat
11    1 0xffffffff84437000     4378 qat.ko (/boot/kernel/qat.ko)
        699 nexus/qat
12    6 0xffffffff8443c000    14d60 qat_hw.ko (/boot/kernel/qat_hw.ko)
        697 pci/qat_c4xxx
        692 pci/qat_200xx
        696 pci/qat_dh895xcc
        693 pci/qat_4xxx
        695 pci/qat_c3xxx
        691 pci/qat_c62x
        694 pci/qat_4xxxvf
13    9 0xffffffff84451000    2ff70 qat_common.ko (/boot/kernel/qat_common.ko)
        689 qat_common
14    8 0xffffffff84481000    68cd8 qat_api.ko (/boot/kernel/qat_api.ko)
        690 qat_api
15    1 0xffffffff844ea000   122c18 qat_c3xxx_fw.ko (/boot/kernel/qat_c3xxx_fw.ko)
        698 qat_c3xxx_fw_fw
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:

But zero QAT activity:

[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root: vmstat -i | grep qat
[23.09-DEVELOPMENT][admin@Router-8.redacted.me]/root:

Reversion to 23.05 removes the issue completely with QAT restored:

[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: vmstat -i | grep qat
irq175: qat0:b1                      176          0
irq176: qat0:b2                      208          0
[23.05.1-RELEASE][admin@Router-8.redacted.me]/root: 

Contra-indication - JimP has reported that QAT is functioning correctly on his C3000-equipped unit:

: dmesg | grep qat
qat0: <Intel c3xxx QuickAssist> mem 0xdfd00000-0xdfd3ffff,0xdfd40000-0xdfd7ffff irq 18 at device 0.0 on pci1
qat0: qat_dev0 started 6 acceleration engines
qat0: FW version: 4.18.0
qat0: Excessive clock measure delay
qat_ocf0: <QAT engine>

: vmstat -i | grep qat
irq62: qat0:b1                     40210          6
irq63: qat0:b2                     11846          2

Original thread:

https://forum.netgate.com/topic/183123/23-09d-is-qat-broken/4?_=1696239799286#

Issue may not be confined to the 6100 & 4100 and as you have to look for the problem it may be obscured to other users.

☕️

Actions #1

Updated by Jim Pingle about 1 year ago

  • Subject changed from QAT Not Functioning On Netgate 4100 & 6100 to QAT is not being used by some daemons
  • Priority changed from High to Normal
  • Target version set to 23.09

QAT isn't broken, it is working with IPsec and OpenVPN DCO which is expected since they are in the kernel.

It isn't being used by OpenSSH, nginx, or OpenVPN, and possibly others in userspace, which is also expected.

What still isn't clear is what was using DCO on 23.05.1 that appears to not be using it on 23.09 since both things we expect to use it still are using it on 23.09 here.

Actions #2

Updated by Jim Pingle about 1 year ago

  • Status changed from New to Feedback
  • Target version deleted (23.09)

Waiting on more info from the OP on the forum since it's not clear there is actually a problem yet. The items we expect to see using QAT are using QAT on 23.09 (kernel encryption for IPsec and OpenVPN DCO), but there is still some unidentified difference between 23.05.1 and 23.09. It may be there was some other problem on 23.05.1 that made it look like it was being used when it wasn't, for example, but we need more data.

EDIT: I reloaded 23.05.1 on a 4100 and I don't see any QAT activity on there at all for the GUI, ssh, curl, etc. As with 23.09, I only see activity on QAT for kernel encryption such as IPsec.

Actions #3

Updated by Rob A about 1 year ago

I still see demonstrable difference between 23.05 and 23.09 dev with QAT. QAT is active on 23.05 for all on-device encrypted traffic, including TLS, openSSL, curl, GUI events, package updates, DoT, file transfers etc. The parameters recommended by Intel all increment inline with this activity. This is achieved with no VPN in use or even configured.

There is a small config difference for QAT in sysctl with an extra line set in 23.09. It may or may not be relevant but the QAT config has changed between versions. Clearly I cannot account for Jim Pingle's findings and there is no explanation as to how QAT can appear active, through appropriate interrupts, when it is not.

Actions #4

Updated by Kris Phillips about 1 year ago

Rob A wrote in #note-3:

I still see demonstrable difference between 23.05 and 23.09 dev with QAT. QAT is active on 23.05 for all on-device encrypted traffic, including TLS, openSSL, curl, GUI events, package updates, DoT, file transfers etc. The parameters recommended by Intel all increment inline with this activity. This is achieved with no VPN in use or even configured.

There is a small config difference for QAT in sysctl with an extra line set in 23.09. It may or may not be relevant but the QAT config has changed between versions. Clearly I cannot account for Jim Pingle's findings and there is no explanation as to how QAT can appear active, through appropriate interrupts, when it is not.

Hello Rob,

Have you tested this on your same config using the 23.09 BETA builds to see if your issue is resolved in the next version?

Actions #5

Updated by Rob A about 1 year ago

Hi Kris,

No change with 23.09 BETA, including 23.09.b.20231020.0600 for QAT on C3xxx QAT hardware (Netgate 6100 in this case).

In this thread jimp has outlined his thoughts on what should and should not work but the documentation is sparse. I'm not sure what the intent is with QAT in the future, if any:
https://forum.netgate.com/topic/183123/23-09d-is-qat-broken/60?_=1697919695106

I have a different device with a newer series of QAT that I understand has been added in the latest beta but I have yet to test it. Hopefully will do so soon.

Actions #6

Updated by Rob A about 1 year ago

I've just tried 23.09.b.20231020.0600 on qat_200xx equipped hardware (Xeon D-1736NT) and I can see that the revision is now in place to utilise QAT on this newer SoC but, as with the qat_c3xxx QAT above, it is not being used for anything outside of VPN use.

I understand that means only 'kernel space' is being used and not 'user space' functions such as openssl's qatengine.

I still hold the view that QAT should be used to its fullest extent, even for functionality confined to the pfSense device itself.

Actions #7

Updated by Jordan G about 1 year ago

in 23.09 I am seeing that after enabling IIMB, regardless of whether AES-NI or QAT is set for cryptographic hardware following a reboot (to enable IIMB) I see nothing when running vmstat -i | grep qat or vmstat -i | grep aes, whereas it would return a list with irq numbers when IIMB was not enabled for both AES-NI/crypto-dev and QAT.

if this needs a different redmine lmk

Actions #8

Updated by Rob A 11 months ago

Post 23.09 iss there intent to expand QAT capabilities beyond the set currently used by pfSense, including 'user-space' capabilities?

Actions

Also available in: Atom PDF