Project

General

Profile

Actions

Bug #14031

closed

Identical SMTP notifications repeat in an infinite loop under certain conditions

Added by Paul Diederich over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Notifications
Target version:
Start date:
Due date:
% Done:

100%

Estimated time:
Plus Target Version:
23.05
Release Notes:
Default
Affected Version:
Affected Architecture:
amd64

Description

When a notification is generated, the system begins sending those notifications to the configured SMTP e-mail address about 5 times a minute. This leads to mass e-mails into the admin's mail box. Even if you go into System > Advance > Notification and check "Disable SMTP Notifications", the system continues to send e-mails. It requires rebooting the entire device to stop the e-mail.

Brief Log Output:
2023-02-25 00:29:37.228999-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:29:24.746051-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:29:11.500714-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:28:58.766996-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:28:46.315235-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:28:34.293087-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:28:21.509579-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:28:08.687651-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:27:56.493064-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:27:44.075196-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:27:31.816187-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:27:19.335701-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:27:06.774360-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:26:54.402019-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK
2023-02-25 00:26:41.984310-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***.** OK
2023-02-25 00:26:29.352357-05:00 php-cgi 56712 notify_monitor.php: Message sent to ***
.** OK


Related issues

Related to Bug #14061: PHP error if a non-privileged shell user attempts an operation which needs to write ``config.cache``ResolvedJim Pingle

Actions
Related to Bug #13224: Email notification flood when UPS (NUT) and WAN send notifications Duplicate

Actions
Actions #1

Updated by Jim Pingle over 1 year ago

  • Status changed from New to Feedback
  • Priority changed from High to Normal

I don't doubt it's happening as this isn't the first report we have received about it, but thus far nobody has been able to reproduce it in lab conditions and we haven't had any leads about possible causes when checking the code handling notifications.

The messages are put into a queue and the queue is cleared when they are sent, but somehow it must be failing to erase that file when this happens. It's unclear what specifically must be failing to get it into that state, however.

Has this happened more than once? Were there any other errors in the logs before it started happening?

Actions #2

Updated by Paul Diederich over 1 year ago

This has happened with a notification that acb.netgate.com couldn't be resolved. Issue was fixed by rebooting the OS. I did look into /var/db/notifyqueue.messages and saw it still has data in it:

a:1:{s:5:"mails";a:1:{s:4:"item";a:2:{i:0;a:3:{s:4:"time";i:1677139260;s:4:"type";s:4:"mail";s:3:"msg";s:118:"The following CA/Certificate entries are expiring:
Certificate: kanako-test (638dc1866bf9d): Expiring soon, in 10 days";}i:1;a:3:{s:4:"time";i:1677225660;s:4:"type";s:4:"mail";s:3:"msg";s:117:"The following CA/Certificate entries are expiring:
Certificate: kanako-test (638dc1866bf9d): Expiring soon, in 9 days";}}}}

I'm not seeing anything else in the logs; just that notify_monitor sent the messages.

Actions #3

Updated by Jim Pingle over 1 year ago

It doesn't appear related to the content of the file, I copied that into mine and triggered a notification and it sent that just fine and wiped the queue after. There must be some other component to triggering it. I don't see anything obvious in the code that looks like a problem except for maybe the locking as I mentioned on your forum thread

Anyone who has seen this happen, please install the System Patches package and then create an entry for 053f60e56d09ba711af245c7f0ce6a06673ccc89 and apply that change, then try to trigger notifications as before to see if the loop happens again. If that does fix it, then this can be combined with #14031

Actions #4

Updated by Jim Pingle over 1 year ago

  • Project changed from pfSense Plus to pfSense
  • Subject changed from System Notifications spam via SMTP to Identical SMTP notifications repeat in an infinite loop under certain conditions
  • Category changed from Notifications to Notifications
  • Target version set to 2.7.0
  • Affected Plus Version deleted (23.01)
  • Plus Target Version set to 23.05
Actions #5

Updated by Paul Diederich over 1 year ago

I applied the system package but that did not solve the issue; its still sending e-mails out constantly and not clearing /var/db/notifyqueue.messages

Actions #6

Updated by Jim Pingle over 1 year ago

Do you get an error if you try to erase /var/db/notifyqueue.messages by hand? You can try rm /var/db/notifyqueue.messages from a shell prompt.

While the current code should be OK, you can also try this change which is a slightly different approach to ensuring the message queue array gets reset.

diff --git a/src/etc/inc/notices.inc b/src/etc/inc/notices.inc
index 539698b821..aef4fdd8bd 100644
--- a/src/etc/inc/notices.inc
+++ b/src/etc/inc/notices.inc
@@ -253,7 +253,10 @@ function notices_sendqueue() {
                if (file_exists("{$g['vardb_path']}/notifyqueue.messages")) {
                        $messages = unserialize(file_get_contents("{$g['vardb_path']}/notifyqueue.messages"));
                        $messagequeue = $messages;
-                       $messages['mails']['item'] = array(); // clear all items to be send
+
+                       /* Reset message queue */
+                       $messages = [];
+                       array_set_path($messages, 'mails/item', []);
                        file_put_contents("{$g['vardb_path']}/notifyqueue.messages", serialize($messages));
                        unset($messages);
                }
Actions #7

Updated by Paul Diederich over 1 year ago

I deleted the content of the file and let some notifications go through. That appears to have solved the issue; so I'm wondering if the previous content for some reason wasn't being properly emptied? Current content of the file is:

a:1:{s:5:"mails";a:1:{s:4:"item";a:0:{}}}

Actions #8

Updated by Jim Pingle over 1 year ago

Curious though I can't see a reason why it would let you do that by hand but the code would fail. For now I'd apply that last diff I posted and keep letting notifications run to see if it happens again.

Actions #9

Updated by Jim Pingle over 1 year ago

  • Related to Bug #14061: PHP error if a non-privileged shell user attempts an operation which needs to write ``config.cache`` added
Actions #10

Updated by Jim Pingle over 1 year ago

I finally had a system in my lab get stuck doing this, it was a fresh image of a 23.05 dev snapshot, restored a config with packages, and then it kept looping sending the same notification.

Looking at the process list, there was one instance of notify_monitor.php stuck that was somehow running as the nut user:

nut     93943   0.0  1.8  97080 36696  -  S    15:33    0:00.78 /usr/local/bin/php-cgi -q /usr/local/bin/notify_monitor.php

None of the notifications were from nut. There was only one notification that was triggered by pfSense itself:

15:33:13 Package reinstall process finished successfully

Killing that one process stopped the flood but the queue wasn't emptied.

Running /usr/local/bin/php-cgi -q /usr/local/bin/notify_monitor.php by hand, it takes a long time to finish, but also doesn't get stuck in a loop. After running it manually, the queue is cleared.

Not sure why that was triggered via nut but at least that's a lead.

Actions #11

Updated by Paul Diederich over 1 year ago

I do have nut installed on my system. It hasn't repeated the process since my last report; maybe its a rare sequence of events that have to happen?

Actions #12

Updated by Jim Pingle over 1 year ago

  • Related to Bug #13224: Email notification flood when UPS (NUT) and WAN send notifications added
Actions #13

Updated by Jim Pingle over 1 year ago

  • Status changed from Feedback to Confirmed
  • Assignee set to Jim Pingle

We've run up against issues like this with NUT before but not always a loop. See https://redmine.pfsense.org/issues/12281#note-4 and #13224

I'm thinking now since the notification queue is owned by root then notify_monitor.php should terminate if it's not being run by root.

As mentioned on #13224 it may happen when NUT triggers a notification at just the wrong time corresponding to a system notification of some other type.

Actions #14

Updated by Haraldinho D over 1 year ago

I can confirm I have this issue too and I have nut installed. Actually, I am suffering from it now and can't reboot as the Misses is watching tv ;-) , hence I am looking for a solution and found this bug report.

On my side it is always a Service Watchdog message because igmpproxy has crashed. Every time igmproxy fails this issue occurs. I am able to trigger this error by removing one of the network cables, so if I need to test something, please let me know.

Come to think of it: when I remove the network cable, the nut server loses connection with the UPS, so probably it throws an error too at that moment.

Actions #15

Updated by Jim Pingle over 1 year ago

I was able to reproduce this on demand by triggering a notification from nut and a notification from the system shortly after:

require_once('service-utils.inc');
require_once('notices.inc');
restart_service('nut');
sleep(5);
file_notice('blah', 'Test message');

As I thought it was because the unprivileged user could not write the message queue, but it could read the message queue. So it would keep going in a loop thinking there were more messages to send.

I added some safety belt checks to hopefully prevent this from happening.

You can install the System Patches package and then create an entry for c5faa351c1ef6d4555478a7f50b3a16ece7e0b2a to apply the fix.

Actions #16

Updated by Jim Pingle over 1 year ago

  • Status changed from Confirmed to Feedback
  • % Done changed from 0 to 100
Actions #17

Updated by Haraldinho D over 1 year ago

I can confirm the fix is working for me. I don't see any repeats anymore. Thanks Jim!

Actions #18

Updated by Jim Pingle over 1 year ago

  • Status changed from Feedback to Resolved

All signs point to this being solved. After patching, there have been no repeats.

Actions #19

Updated by Azamat Khakimyanov over 1 year ago

Tested on 23.01

I was able to reproduce this issue.
After applying c5faa351c1ef6d4555478a7f50b3a16ece7e0b2a, the issue was fixed.

Actions #20

Updated by Denny Page over 1 year ago

Jim, I tested this patch and the end effect, at least for 23.01, is that non root notifications just silently fail.

Two things:

  • Is there some other way that this can be addressed that still allows non root notifications to function? The way it currently sits, this would require all packages that use notifications to be run as root, which seems rather undesirable.
  • When the notification does fail, it would be desirable if there were some form of error in the log.
Actions #21

Updated by Jim Pingle over 1 year ago

That's not a problem we can solve here. Somehow the unprivileged process would have to submit a message into the queue and then some other privileged process would have to come along after and send it. Right now both are handled by the same user that triggered the notification. But that means we'd have to setup some other mechanism (daemon, cron script, etc) that would run the message queue and that's way out of scope for this issue. That would be a completely fresh feature request that would require designing a proper solution to implement in a future version.

Actions #22

Updated by Denny Page over 1 year ago

Given that non-root package notifications worked previously, I think users are going to perceive this as a regression (bug).

Is there a recommendation as to how we should address the immediate issue? Should we change all packages to run as root?

Actions #23

Updated by Jim Pingle over 1 year ago

They absolutely did not work correctly (hence this bug). They failed in new/different ways on 23.01 but they also failed on older versions. The times it happened to work were purely by chance.

The only package I'm aware of that is affected is nut, however that package wants to handle things is undecided. There are no defined best practices there currently, but this particular change is complete as-is.

Actions #24

Updated by Denny Page over 1 year ago

Hmm... "the times it happened to work were purely by chance" seems quite a bit overstated. It seems to be rather the opposite--the number of times that notifications have failed appears to have been incredibly rare, as evidenced by the difficulty in reproducing this bug.

Regardless, do you have a recommendation for another approach to address the issue of email notifications from non root entities? Or do I need to change the NUT package to always run as root?

Actions #25

Updated by Jim Pingle over 1 year ago

How to properly send notifications from nut is not a topic for this Redmine. Feel free to discuss it more on the forum.

Actions #26

Updated by Denny Page over 1 year ago

Thanks. I submitted a PR that addresses the issue.

Actions #27

Updated by Denny Page over 1 year ago

I would like to propose a small change to this fix. It is a "one liner," which maintains the ability to send notifications as a non root user. I think this is a reasonable approach that protects pfSense from the spamming problem while allowing package maintainers to be able to run processes without high privilege. I would appreciate it if you could consider this. Thanks.

--- notices.inc.old    2023-04-23 09:44:28.161806000 -0700
+++ notices.inc    2023-05-02 09:36:47.548761000 -0700
@@ -356,7 +356,7 @@

     /* Store last message sent to avoid spamming */
     @file_put_contents("/var/db/notices_lastmsg.txt", $message);
-    if (!$force) {
+    if ($force == false && posix_geteuid() == 0) {
         notify_via_queue_add($message, 'mail');
         $ret = true;
     } else {

Actions #28

Updated by Jim Pingle over 1 year ago

It's too late for this issue / Plus 23.05, but if you want to open a new Redmine issue and propose it there we can consider it for 23.09/2.7.0.

Actions

Also available in: Atom PDF