Project

General

Profile

Actions

Bug #12095

open

Memory leak in pcscd

Added by Steve Wheeler 7 months ago. Updated 7 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Authentication
Target version:
Start date:
06/30/2021
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Plus-Next
Release Notes:
Default
Affected Version:
2.5.x
Affected Architecture:
All

Description

The PCSC daemon looks to have a memory leak even when it's not in use. Or even when there are no IPSec tunnels defined which might be tied to it.

last pid: 99559;  load averages:  0.25,  0.18,  0.11                up 12+01:45:56  18:38:11
69 processes:  1 running, 68 sleeping
CPU:  0.2% user,  0.0% nice,  1.2% system,  0.0% interrupt, 98.6% idle
Mem: 31M Active, 1047M Inact, 502M Wired, 116M Buf, 347M Free
Swap: 1908M Total, 1908M Free
  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 7760 root          3  20    0   990M   976M select   1  24:23   0.14% pcscd
 9887 root          1  52    0   105M    39M accept   0   1:20   0.00% php-fpm
87665 root          1  52    0   102M    36M accept   0   0:43   0.00% php-fpm
 4918 root          1  52    0   102M    36M accept   1   0:40   0.00% php-fpm
77558 root          1  52    0   102M    36M accept   1   0:38   0.00% php-fpm

Stopping and then startingthe services resets the memory use to ~4MB. If you have IPSec tunnels defined which doing that the IPSec service will also need to be stopped and then started.

There looks to be a limit at ~1GB so it could be only lower memory systems that are affected.

Tested: CE 2.5.1, Plus 21.05 and 21.09.


Files

clipboard-202108061650-gdclv.png (55.1 KB) clipboard-202108061650-gdclv.png Sean M, 08/06/2021 04:50 PM
clipboard-202108251709-me8di.png (73.1 KB) clipboard-202108251709-me8di.png Michael Smith, 08/25/2021 04:09 PM
PFsence memory leak.png (9.58 KB) PFsence memory leak.png Mr Sparkles, 01/20/2022 10:15 AM
PFsence memory leak2.png (16.5 KB) PFsence memory leak2.png Mr Sparkles, 01/20/2022 10:15 AM

Related issues

Related to Todo #11933: PC/SC Smart Card Daemon ``pcscd`` running on all devices at all times, should be optionalResolvedViktor Gurov05/17/2021

Actions
Actions #1

Updated by Jim Pingle 7 months ago

Could be partially mitigated by #11933 -- That daemon should be made optional and off by default except for the few people who need its capabilities for PKCS#11.

Actions #2

Updated by Jim Pingle 7 months ago

  • Related to Todo #11933: PC/SC Smart Card Daemon ``pcscd`` running on all devices at all times, should be optional added
Actions #3

Updated by Marcos Mendoza 7 months ago

Here are some stats on various 21.05 VMs:

  • Uptime: 6d2h
  • RAM: 1G
  • pcscd usage: 326M
  • 2 VTI IPsec tunnels
last pid: 11696;  load averages:  1.32,  0.95,  0.89                                            up 6+02:41:39  13:14:03
66 processes:  1 running, 65 sleeping
CPU:  0.0% user,  0.0% nice,  0.0% system,  0.8% interrupt, 99.2% idle
Mem: 58M Active, 398M Inact, 223M Wired, 100M Buf, 257M Free
Swap: 410M Total, 410M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
 7702 root          5  20    0   326M   320M select   3:51   0.02% pcscd
70890 root          1  20    0   103M    38M getblk   0:00   0.00% php-cgi
85503 root          1  52    0   103M    36M accept   0:00   0.00% php-fpm
15450 root          1  52    0   103M    38M accept   0:01   0.00% php-fpm
  • Uptime: 11d16h
  • RAM: 1G
  • pcscd usage: 620M
  • No IPsec tunnels
last pid: 51378;  load averages:  0.38,  0.55,  0.57                                           up 11+16:07:38  13:13:21
54 processes:  1 running, 53 sleeping
CPU:  2.7% user,  3.8% nice, 13.7% system,  0.0% interrupt, 79.8% idle
Mem: 30M Active, 246M Inact, 355M Laundry, 210M Wired, 136K Buf, 89M Free
ARC: 71M Total, 40M MFU, 26M MRU, 1573K Anon, 564K Header, 3446K Other
     38M Compressed, 95M Uncompressed, 2.50:1 Ratio
Swap: 2048M Total, 192M Used, 1856M Free, 9% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    TIME    WCPU COMMAND
 7648 root          3  20    0   620M   496M select   7:19   0.05% pcscd
  352 root          1  52    0   131M    28M accept   1:06   0.00% php-fpm
 1716 root          1  52    0   103M    28M accept   0:21   0.00% php-fpm
46386 root          1  52    0   103M    27M accept   0:15   0.00% php-fpm
  • Uptime: 5d17h
  • RAM: 8G
  • pcscd usage: 312M
  • No IPsec tunnels
last pid: 29474;  load averages:  0.50,  0.39,  0.30                                            up 5+17:21:00  13:23:51
69 processes:  2 running, 67 sleeping
CPU:  2.9% user,  0.0% nice,  2.9% system,  0.0% interrupt, 94.1% idle
Mem: 61M Active, 547M Inact, 897M Wired, 240K Buf, 6391M Free
ARC: 340M Total, 106M MFU, 224M MRU, 1344K Anon, 1224K Header, 7573K Other
     93M Compressed, 268M Uncompressed, 2.87:1 Ratio
Swap: 2048M Total, 2048M Free

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
13305 root          3  20    0   312M   301M select   3   3:26   0.02% pcscd
 6196 unbound       4  20    0   149M    79M kqread   2   2:08   0.31% unbound
  355 root          1  52    0   135M    48M accept   3   0:29   1.19% php-fpm
  356 root          1  20    0   135M    48M piperd   0   0:24   0.33% php-fpm
Actions #4

Updated by Marcos Mendoza 7 months ago

Additional note:

Stopping the service while IPsec is in use leads to the following log spam:

Jul 4 14:21:28     charon     11632     02[CFG] C_GetSlotInfo failed: SLOT_ID_INVALID
Jul 4 14:21:28     charon     11632     02[CFG] error in C_WaitForSlotEvent: GENERAL_ERROR 

This ends up causing continuous log rotation in very short periods which leads to quickly growing disk usage and 100% CPU usage.

Actions #5

Updated by Steve Yates 6 months ago

There looks to be a limit at ~1GB

If you meant a limit for pcscd's RAM usage, I pulled up a few...a 3100 (21.05) is at 960 MB and a couple of 4860s just over 1000, but our lone 2.5.2 has almost 2 GB:

PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
7457 root 3 20 0 1946M 1929M select 1 22:00 0.00% pcscd

That's with uptime 18 days. On all four, the dashboard shows memory usage % that is lower than even just the pcscd process.

Actions #6

Updated by Sean M 6 months ago

I haven't run into this issue before but just today I noticed swap usage at 100% and memory was very high, turns out pcscd had leaked and consumed the memory.

Currently on:
21.05-RELEASE (amd64)
FreeBSD 12.2-STABLE

Actions #7

Updated by Alexander Arques 6 months ago

Just registered to report the same issue. I have never used smart cards or IPSec tunnels and today I noticed all swap space was used up and RAM usage was at 80%, turns out the pcscd daemon was using more than 1GB of memory.

Running:
2.5.2-RELEASE (amd64)
built on Fri Jul 02 15:33:00 EDT 2021
FreeBSD 12.2-STABLE

Uptime: 34 days

Actions #8

Updated by Michael Smith 5 months ago

Can confirm the bug on my system. Was a clean upgrade from the last version.
2.5.2-RELEASE (amd64)
built on Fri Jul 02 15:33:00 EDT 2021
FreeBSD 12.2-STABLE

Actions #9

Updated by Uwe Dippel 5 months ago

Same-same. 7 days of uptime, over night it ramped up and killed DNS ('no space left'). 2.5.2-RELEASE (amd64) clean install. Though, the culprit is well-known.
To me, priority of 'normal' is questionable since my users reported 'the internet is down'. No dig @pfsense. Just restarted the service and noticed the horrendous memory usage. Disabled pcscd and immediately everything was back to normal.

Actions #10

Updated by Jim Pingle 5 months ago

Uwe Dippel wrote in #note-9:

Same-same. 7 days of uptime, over night it ramped up and killed DNS ('no space left'). 2.5.2-RELEASE (amd64) clean install. Though, the culprit is well-known.
To me, priority of 'normal' is questionable since my users reported 'the internet is down'. No dig @pfsense. Just restarted the service and noticed the horrendous memory usage. Disabled pcscd and immediately everything was back to normal.

We agree that it's a sizable problem, but the main reason it's set to "normal" is because the service is disabled by default now (See #11933 ) and very few people need to enable it. It's only necessary for those who need to use PKCS#11 Smart Cards for certificates. We have not seen any reports that the memory leak happens when pcscd is latched onto actual hardware, so it's possible that this isn't an issue for those who actively use the daemon. It's also possible that there are no reports because not enough people use it yet to notice.

Either way, the problem resides in the daemon which we do not control, so it would need to be fixed upstream, hence our focus on a more practical workaround/mitigation.

Until the next release which includes the changes from #11933 you can add the main commit on that issue as a system patch to disable the pcscd service (See #11933#note-7 )

Actions #11

Updated by Charles Ng 5 months ago

I see the same log spam as described in https://redmine.pfsense.org/issues/12095#note-4 if pcscd is stopped.

The log rotation bzip2 processes consume all the CPU cycles.

USER      PID %CPU %MEM    VSZ   RSS TT  STAT STARTED        TIME COMMAND
root     3328 16.1  0.9  18244  9512  -  R    21:59       4:28.92 bzip2 -f /var/log/ipsec.log.0
root      487 15.8  0.9  18244  9500  -  R    22:06       3:10.48 bzip2 -f /var/log/ipsec.log.6
root    28366 15.8  0.9  18244  9504  -  R    22:01       3:59.19 bzip2 -f /var/log/ipsec.log.6
root    88479 15.6  0.9  18244  9504  -  R    21:58       4:43.51 bzip2 -f /var/log/ipsec.log.4
root    80103 15.5  0.9  18244  9504  -  R    21:57       4:52.81 bzip2 -f /var/log/ipsec.log.3
root    99250 15.1  0.9  18208  9460  -  R    22:33       0:06.21 bzip2 -f /var/log/ipsec.log.0
root    55746 14.9  0.9  18244  9504  -  R    22:03       3:39.07 bzip2 -f /var/log/ipsec.log.3
root    86455 14.9  0.9  18244  9500  -  R    22:05       3:12.95 bzip2 -f /var/log/ipsec.log.5
root    50899 14.6  0.9  18244  9504  -  R    22:02       3:47.29 bzip2 -f /var/log/ipsec.log.2
root    75157 14.6  0.9  18244  9504  -  R    22:04       3:29.48 bzip2 -f /var/log/ipsec.log.4
root    27619 14.1  0.9  18244  9504  -  R    22:00       4:09.67 bzip2 -f /var/log/ipsec.log.1
root    20617 13.6  0.9  18244  9504  -  R    22:00       4:15.34 bzip2 -f /var/log/ipsec.log.5

Actions #12

Updated by Jim Pingle 5 months ago

Charles Ng wrote in #note-11:

I see the same log spam as described in https://redmine.pfsense.org/issues/12095#note-4 if pcscd is stopped.

That is known and expected. Do not just stop the service if you have IPsec enabled. Read my previous comment for a link to a viable workaround.

Actions #13

Updated by Bug Reporter 5 months ago

Just found out where 1.3GiB of my free memory went. Returned to normal as soon as I killed the pcscd.

Actions #14

Updated by Philip Cook 5 months ago

This memory leak is rather problematic.
2.5.2-RELEASE (amd64)
Uptime 64 Days 22 Hours 02 Minutes 06 Seconds

last pid:   891;  load averages:  0.90,  0.77,  0.78                                                                                                                 up 64+22:00:17  02:18:37
76 processes:  1 running, 75 sleeping
CPU:  2.3% user,  0.0% nice,  1.2% system,  0.5% interrupt, 96.0% idle
Mem: 2264M Active, 2039M Inact, 2227M Laundry, 1019M Wired, 554M Buf, 273M Free
Swap: 4096M Total, 2159M Used, 1937M Free, 52% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
 9896 root          3  20    0  4557M  2389M select   4  54:08   0.04% pcscd

Actions #15

Updated by Marcos Mendoza 5 months ago

Philip Cook wrote in #note-14:

This memory leak is rather problematic.
2.5.2-RELEASE (amd64)
Uptime 64 Days 22 Hours 02 Minutes 06 Seconds

[...]

See JimP's comments https://redmine.pfsense.org/issues/12095#note-10

You can apply the patch that disables it when not needed to avoid the issue.

Actions #16

Updated by Phil K 3 months ago

The same happened to me today. I realized it when I started receiving e-mails with lines like

bzip2: I/O or other error, bailing out.  Possible reason follows.
bzip2: No such file or directory
    Input file = /var/log/ipsec.log.23, output file = /var/log/ipsec.log.23.bz2
newsyslog: `/usr/bin/bzip2 -f /var/log/ipsec.log.23' terminated with a non-zero status (1)

Apparently, this is caused by the high CPU load and bzip2 processes taking a long time to compress logfiles (I found https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200741 about this. Seems to be another (unresolved?) issue).

Anyway, all of a sudden I was in a situation in which pcscd had been killed, not by me but by the kernel due to it consuming too much memory (Nov 10 18:15:01 neptun kernel: pid 7452 (pcscd), jid 0, uid 0, was killed: out of swap space) and then, the log spam kicked in.

I tried both manually restarting the pcscd and applying the patch mentioned in #11933, but couldn't stop it. I ended up restarting the pfSense, hoping that it would work. As this specific instance's only connection to the Internet is through an IPSec tunnel it felt a bit risky restarting it if there is an IPSec related issue.

Would there have been any other way to stop the log spam? I suppose this could be helpful for others that experience the same issue.

Actions #17

Updated by Pedro Ribeiro 2 months ago

I politely disagree with the assigned priority for this bug, particularly given that a CE release is likely months away. Initially I hit the problem with pcscd leaking memory on two pfSense boxes yesterday, with the first sign being no DNS due to unbound having been killed, probably due to out-of-memory conditions, so then I disabled pcscd on both boxes. But the next day I stumbled upon high CPU usage on both boxes, with difficulty to get both the web interface/ssh responding. In case it helps anyone, here are the steps I used to regain control.

1. I applied the patch afcc0e9c97c1993ae6b95f886665fcb4375d26c7 as suggested.
2. Stopped and started the IPSec service (charon) as clicking restart didn't seem to make a difference at all.
3. Meanwhile syslog continued misbehaving. I had to kill bzip2 from the command line several times while doing the next steps.
4. Cleared all logs.
5. Importantly, I changed the compression for log retention on purpose, as given the text WARNING: Changing this value will remove previously rotated compressed log files! I figured this would be exactly what I needed, kill any log rotation given the issue with bzip2.

The issue with syslog/bzip2 on its own seems intriguing. One would expect that disabling a service - whatever it may be - wouldn't have catastrophic consequences in terms of repeated, unbounded logging, eventually maxing out the CPU, which makes regaining control of a box challenging.

Actions #18

Updated by Jim Pingle 2 months ago

The problems you're hitting are a mix of somewhat but not really related things.

This issue being the memory leak itself, which is beyond our control, hence the other issues we have made to work around it. The workarounds are on #11933 and have already been implemented in the code base, but have not yet been included in a release. The next release will be coming sometime in the few months ahead. The snapshots are quite stable if people want to run them (in a lab first to test).

Pedro Ribeiro wrote in #note-17:

1. I applied the patch afcc0e9c97c1993ae6b95f886665fcb4375d26c7 as suggested.
2. Stopped and started the IPSec service (charon) as clicking restart didn't seem to make a difference at all.

The best thing to do there is reboot after applying the patch. Though some have seen that pressing Save on the advanced settings tab in IPsec is sufficient, it's safest to reboot. The discussion around that should stay on #11933 and not here as it isn't relevant to the memory leak, but the changes we have made to control pcscd.

3. Meanwhile syslog continued misbehaving. I had to kill bzip2 from the command line several times while doing the next steps.

This is because IPsec is logging like crazy because it can't communicate with the (stopped) pcscd daemon which its configuration still references at that point, since applying the patch doesn't do anything but adjust the code. The daemon configuration needs to be refreshed. We have another separate issue for this in #12468 so going forward the daemons are stopped and started together properly if/when pcscd is enabled.

5. Importantly, I changed the compression for log retention on purpose, as given the text WARNING: Changing this value will remove previously rotated compressed log files! I figured this would be exactly what I needed, kill any log rotation given the issue with bzip2.

Sort of, but it depends on what is causing it. The problem again is the high rate of logging at the time. Disabling compression can help there since compression will bog down the log rotation process but syslogd itself will still be taking quite a bit of load. It also doesn't kill the backlog of existing bzip2 processes. See #12011 for more there.

Actions #20

Updated by Simon Quigley 18 days ago

Not sure there's a lot of value in my post, if this service has now been set to disabled by default..

I just wanted to report that my pfsense instance was throwing a lot of errors related to swap in the kernel log, and when I looked, this daemon was eating 15G of the 16G of ram in my box.


ast pid: 72748;  load averages:  0.56,  0.41,  0.50                                      up 143+06:16:53 18:57:54
216 processes: 1 running, 215 sleeping
CPU:  4.5% user,  0.3% nice,  1.7% system,  0.2% interrupt, 93.4% idle
Mem: 455M Active, 1906M Inact, 11G Laundry, 2010M Wired, 1540M Buf, 468M Free
Swap: 4096M Total, 4088M Used, 8260K Free, 99% Inuse

  PID USERNAME    THR PRI NICE   SIZE    RES STATE    C   TIME    WCPU COMMAND
39410 root          1  52    0   133M    32M accept   3   0:39   0.59% php-fpm
85739 root          1  52    0   133M    32M accept   3   0:46   0.10% php-fpm
66234 root          1  52    0   133M    32M accept   1   0:38   0.10% php-fpm
61691 root          3  20    0    15G    11G select   2 217:40   0.00% pcscd

Which is pretty horrendous. I stopped the service, and I will disable it, until I'm able to upgrade and reboot.

Actions #21

Updated by Mr Sparkles 7 days ago

This may be redundant information, just mentioning that this mem leak is not only an issue on lower memory systems.

System:
2.5.2-RELEASE (amd64)
built on Fri Jul 02 15:33:00 EDT 2021
FreeBSD 12.2-STABLE

Uptime: 16 days


Actions

Also available in: Atom PDF