Project

General

Profile

Actions

Regression #11524

closed

Using SHA1 or SHA256 with AES-NI may fail if AES-NI attempts to accelerate hashing

Added by Jim Pingle over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Very High
Assignee:
Category:
Hardware / Drivers
Target version:
Start date:
02/24/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
21.05
Release Notes:
Default
Affected Version:
2.5.0
Affected Architecture:
All

Description

Based on at least one report, it appears AES-NI on Plus 21.02/2.5.0 has an issue with SHA-256 and some clients, notably Android and Apple clients.

https://forum.netgate.com/topic/161268/ipsec-tunnels-using-sha256-may-not-connect

If the tunnel is switched to a different hash or if AES-NI is disabled, the problems do not occur. There is no problem when using other accelerators such as QAT, only AES-NI appears to be affected.

Per Mark J the AES-NI driver in Plus 21.02/2.5.0 now supports accelerating SHA, so it's possible there is a difference in the implementation of SHA-256 in AES-NI than in the OS.

Historically there were differences with SHA-256 on FreeBSD which could lead to similar problems. It was standardized on the RFC 4868 implementation about 10 years ago (ref: http://lists.freebsd.org/pipermail/svn-src-head/2011-February/025040.html )


Files

disable-sha.patch (474 Bytes) disable-sha.patch Jan de Groot, 04/16/2021 01:46 PM
Actions #1

Updated by Jim Pingle over 3 years ago

Specifically, the hardware from the thread above is a Netgate 5100 running pfSense Plus, but this likely affects both Plus and CE. Needs more data, however.

Actions #2

Updated by Jim Pingle over 3 years ago

Another potential report at https://forum.netgate.com/topic/161354/ipsec-packet-loss-routing-issue-with-21-02-release but for the Netgate 7100. Waiting on more data/confirmation that moving off AES-NI helps there yet.

Actions #3

Updated by Kris Phillips over 3 years ago

This also affects Site to Site VPN tunnels. Please reference internal ticket 76224 for another example of this bug causing issues.

Actions #4

Updated by Kris Phillips over 3 years ago

Interesting point to mention related to IPSec: If you lower the subnet size to something like a /30 this issue takes longer to rear its head. If you up the subnet size on a tunnel to something bigger like a /17 and then restart the IPsec service packets will pass for about 2-3 seconds and then die. With a /30 it can take upwards of a few minutes before traffic stops passing.

Actions #5

Updated by Chris Linstruth over 3 years ago

To addto the above: looks like TAC had one that was Plus 21.02 on an XG-7100 on one side and Azure VPN on the other. Disabling AES-NI stopped it from failing after "some traffic."

Actions #6

Updated by Jim Pingle over 3 years ago

  • Target version changed from CE-Next to 2.5.1
Actions #7

Updated by Jim Pingle over 3 years ago

There have been multiple additional confirmations of this from customers and forum users, and in each case thus far, switching to QAT or switching the hash has stabilized the IPsec behavior.

Actions #8

Updated by Michael Spears over 3 years ago

Jim Pingle wrote:

Based on at least one report, it appears AES-NI on Plus 21.02/2.5.0 has an issue with SHA-256 and some clients, notably Android and Apple clients.

https://forum.netgate.com/topic/161268/ipsec-tunnels-using-sha256-may-not-connect

If the tunnel is switched to a different hash or if AES-NI is disabled, the problems do not occur. There is no problem when using other accelerators such as QAT, only AES-NI appears to be affected.

Per Mark J the AES-NI driver in Plus 21.02/2.5.0 now supports accelerating SHA, so it's possible there is a difference in the implementation of SHA-256 in AES-NI than in the OS.

Historically there were differences with SHA-256 on FreeBSD which could lead to similar problems. It was standardized on the RFC 4868 implementation about 10 years ago (ref: http://lists.freebsd.org/pipermail/svn-src-head/2011-February/025040.html )

Assisted a customer with this today on a 5100

Actions #9

Updated by Yury Zaytsev over 3 years ago

We've hit this after upgrade from 2.4.5 to 2.5.0 on our two SG-5100 - was terribly difficult to figure it out, but thanks to NetGate to pointing us in the right direction!

Actions #10

Updated by Renato Botelho over 3 years ago

  • Target version changed from 2.5.1 to CE-Next

Not enough time for 2.5.1

Actions #11

Updated by Jim Pingle over 3 years ago

  • Subject changed from Using SHA256 with AES-NI may fail for some clients to Using SHA1 or SHA256 with AES-NI may fail if AES-NI attempts to accelerate hashing

Updating subject.

Note that this problem only affects CPUs which report the ability to accelerate SHA1 and SHA256.

When AES-NI is active the System Information widget on the Dashboard indicates whether or not acceleration for the affected hashes is supported. For example:

Unsupported:

Hardware crypto AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS

Supported:

Hardware crypto AES-CBC,AES-CCM,AES-GCM,AES-ICM,AES-XTS,SHA1,SHA256

In the latter case, to avoid problems with SHA1 or SHA256 the cryptographic support option should be changed to QAT for those on pfSense Plus. On pfSense CE, change to an AEAD cipher such as AES-GCM which does not utilize hashes or switch to a different hash (e.g. SHA-512).

Actions #12

Updated by Jan de Groot over 3 years ago

This hit me after migrating a pfSense CE firewall for a customer. The Atom C3000 series CPU in the new firewall has SHA1/SHA256 offload, the old CPU didn't have any offloading at all but was faster than the Atom. The customer has Windows VPN clients, Windows can't do AES-GCM, only 3DES or AES-CBC.

I applied a hotfix by disabling sha support in the AESNI module. This requires a kernel recompile, but after that the /boot/kernel/aesni.ko module can be replaced.

Attached is the quickfix patch.

Actions #13

Updated by Jim Pingle over 3 years ago

After inspecting the code, disabling the SHA functionality in AES-NI is the best course of action.

Actions #14

Updated by Luiz Souza over 3 years ago

  • Status changed from New to Feedback

Regression fixed in 2.6 devel.

Actions #15

Updated by Renato Botelho over 3 years ago

Another fix [1] was imported from FreeBSD and will be present on tomorrow's snapshots

[1] https://cgit.freebsd.org/src/commit/?id=62e32cf9140e6c13663dcd69ec3b3c7ca4579782

Actions #16

Updated by Jim Pingle over 3 years ago

  • Target version changed from CE-Next to 2.6.0
Actions #17

Updated by Jim Pingle over 3 years ago

  • Plus Target Version set to 21.05
Actions #18

Updated by Jim Pingle over 3 years ago

Already in 21.05 builds.

Actions #19

Updated by Jim Pingle over 3 years ago

  • Target version changed from 2.6.0 to 2.5.2
Actions #20

Updated by Marcos M over 3 years ago

Tested with SHA256 on IPsec P1 and SHA1 on P2 on 21.05-RC built on Wed May 26 18:11:31 EDT 2021 with AES-NI selected in system settings. Traffic passed correctly.

Actions #21

Updated by Jim Pingle over 3 years ago

  • Status changed from Feedback to Closed
Actions #22

Updated by Jim Pingle over 3 years ago

  • Status changed from Closed to Feedback

Due to changes in the freebsd-src branch used to build 2.5.2 snapshots, this needs re-tested on a build dated after this comment.

Actions #23

Updated by Jim Pingle over 3 years ago

  • Status changed from Feedback to Closed
Actions

Also available in: Atom PDF