Project

General

Profile

Bug #8707

New PHP Error [/etc/inc/gwlb.inc]

Added by Dirk Steingäßer over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Gateways
Target version:
Start date:
07/27/2018
Due date:
% Done:

0%

Estimated time:
Affected Version:
2.4.4
Affected Architecture:
All

Description

Crash report begins. Anonymous machine information:

amd64
11.2-RELEASE
FreeBSD 11.2-RELEASE #31 cd0e4c8cf25(RELENG_2_4_4): Wed Jul 25 06:22:26 EDT 2018 root@buildbot3:/builder/crossbuild-ce-master/obj/amd64/FWJoMRHc/builder/crossbuild-ce-master/pfSense/tmp/FreeBSD-src/sys/pfSense

Crash report details:

PHP Errors:
[26-Jul-2018 13:43:45 Europe/Berlin] PHP Warning: touch(): Utime failed: No such file or directory in /etc/inc/gwlb.inc on line 1213
[26-Jul-2018 23:27:25 Europe/Berlin] PHP Warning: touch(): Utime failed: No such file or directory in /etc/inc/gwlb.inc on line 1213
[27-Jul-2018 14:35:35 Europe/Berlin] PHP Warning: touch(): Utime failed: No such file or directory in /etc/inc/gwlb.inc on line 1213
[27-Jul-2018 21:53:06 Europe/Berlin] PHP Warning: touch(): Utime failed: No such file or directory in /etc/inc/gwlb.inc on line 1213

No FreeBSD crash data found.

gwlb.inc.patch#8707 (963 Bytes) gwlb.inc.patch#8707 Tyler L, 08/01/2018 03:50 PM
gwlb-empty-gwname.diff (480 Bytes) gwlb-empty-gwname.diff Jim Pingle, 08/02/2018 10:54 AM
config_gateway.xml (3.81 KB) config_gateway.xml Tyler L, 08/14/2018 08:46 PM

History

#1 Updated by Tyler L over 1 year ago

Upgraded from June build, now running Sat Jul 28 19:13:31 EDT 2018 build, and have this issue.

It occurs when gateways go down. I have many gateways set up, and when they go down due to latency/PL, I get this crash report and error for each gateway that end up involved.

Doesn't seem to affect the gateway itself, the GUI will show Down/Up with stats as usual. Gets really annoying though.

#2 Updated by Dirk Steingäßer over 1 year ago

Ok makes sense. I have that situation with a very I stable gateway of 4 in total.

#3 Updated by Tyler L over 1 year ago

I found and addressed the issue, though I don't know how to file a diff or anything.

Issue stems from b5e93be6, the associated references weren't changed to reflect revision.

Lines 1212 - 1224 of gwlib.inc, references to "{$gwname}" should reflect revised "$gwname"

Fixed locally, works fine now. Lock files show up on downed gateways.

#4 Updated by Tyler L over 1 year ago

I updated to 07.31 and found that the one gateway that kept failing, now functioned fine despite the gwlb.inc being rewritten. When a different gateway went down though, it still caused the same issue.

I have no clue why, perhaps someone smarter than me can figure it out. Until then, I made a patch file using the edits I mentioned above, which seem to fix everything for me regarding it.

#5 Updated by Jim Pingle over 1 year ago

That change would be a no-op, they both reference the same variable in the same way. In fact it goes against the recommended way to reference variables in strings since the brace method ensures PHP will properly identify and replace the variable in ambiguous situations.

More likely is that there is some other issue behind the scenes and reloading or even rebooting changed the behavior somehow.

#6 Updated by Jim Pingle over 1 year ago

Try this instead.

#7 Updated by Tyler L over 1 year ago

Thanks Jim! This is why I asked for someone smart to look into it. I reverted what I did, and patched using your file. We'll see as the day goes on what happens or doesn't, and I'll report back the findings.

#8 Updated by Tyler L over 1 year ago

Well, that didn't take long. I should have just waited another 10min.

So, what happens now is this: Gateway goes high pl/latency, it takes the GW down, and I get no notification email, nor any crash reports. The dashboard GW status display shows the correct status's but I get nothing beyond that. No lock files are created either in /tmp/

I'm open to anything

#9 Updated by Jim Pingle over 1 year ago

Can you provide a bit more information about your gateway and gateway group configuration, along with an example of which gateway fails to cause this behavior for you?

#10 Updated by Tyler L over 1 year ago

Sure! We have 5 openVPN client gateways, 1 WAN (isp) gateway in total. Currently only 4 openVPN gateways are enabled though. Gateway groups are several, from straight single tiered gateways (one gw per tier), to multi tier (two/three in a tier, the rest single tiered). They're not setup special, straight Member Down setup. Load balances just as I expect also. Been working excellently until the move from june to july build.

What can happen is one of the vpn's gets ddosed and goes high pl/lat, pfSense then takes the gw Down until it's stable, then brings it back Up. When it goes high pl/lat/down I get an email notification, or at least, I used to. GW Groups are set up for redundancy more or less, but nothing special there either.

The issue applies when 'any' of the gw's go down. It's not picky about one in particular. Yesterday I disabled a heavily hit gw just to remove the annoyance, and the issue remained the same with another gw when it got hit for a few minutes.

If I've missed anything, please let me know.

#11 Updated by Tyler L over 1 year ago

After more days of running, you sir are accurate, my idea is a non-start fail but you knew that. :)

With your patch, I get no crashes nor notifications at all. With either my patch or none at all, I get php crashes, a new line for each down instance.

I also have a arpwatch error, despite the interface/gw being disabled, but doubt that's related.

I'm open to help any way I can.

#12 Updated by Jim Pingle over 1 year ago

  • Status changed from New to This Sprint
  • Assignee set to Jim Pingle

Nothing in there seems like it should cause any issues. I'm wondering if maybe you have a stray empty gateway tag in config.xml.

Can you post the <gateways> ... </gateways> section of your config.xml? You can mask IP addresses and names but please do not remove any values entirely.

#13 Updated by Tyler L over 1 year ago

And ye shall receive. Note that VPN2 (opt6) is the gateway and interface that's disabled.

#14 Updated by Jim Pingle over 1 year ago

The only potential problem there that I see is that one of your gateway groups consists entirely of gateways that are not in your configuration. Do these gateways actually exist and are dynamic? Or have they been deleted?

                <gateway_group>
                        <name>iot_fallback_gateway</name>
                        <item>VPNL1_VPNV4|2|address</item>
                        <item>VPNL3_VPNV4|3|address</item>
                        <item>VPNL5_VPNV4|1|address</item>
                        <trigger>down</trigger>
                        <descr></descr>
                </gateway_group>

I still can't reproduce the error with a similar configuration here, though. Based on the error I expected one of the gateway groups to have an empty gateway name or blank item tag.

If you remove the patch and also remove that gateway group, does the error still happen?

#15 Updated by Tyler L over 1 year ago

That gw group also includes wan as [5], I'm not sure why it didn't show for you? They are dynamic though and are all active except vpn2/opt6.

I'll remove all patches/edits and use a clean copy from the repo for sanity, and I'll kill that gw group for testing.

Will report back after it crashes.

#16 Updated by Tyler L over 1 year ago

Well, this is wackado. I replaced gwlb.inc with a fresh new repo copy and restarted php-fpm, and proceeded to start killing gateways only to find that now, notifications/emails are working. Ummm, I'm seriously at a loss here.?.?

#17 Updated by Jim Pingle over 1 year ago

It's entirely possible that the underlying bug was fixed between the original report and now, and that this was a side effect.

#18 Updated by Tyler L over 1 year ago

I can buy that reasoning, considering the pita that the php7 upgrade has appeared to be with bugs.

After I posted, I was bewildered and did another console update to Wed Aug 15 10:03:30 EDT 2018, no issues still regarding this. Call it a random phantom issue, I'm not sure, sounds possible though.

Thank you for the help, wish more was known, but seems okay now for whatever reason.

#19 Updated by Jim Pingle over 1 year ago

  • Status changed from This Sprint to Resolved

I'll mark this resolved for now then. If you can manage to reproduce it again, let us know.

#20 Updated by Tyler L over 1 year ago

Maybe I spoke too soon? This is simply nutty...

So, everything was working, and I went to change the monitor ip on a gateway. That worked fine. I clicked for the dashboard to verify everything and boom.. Crash..

Now it's all broken again. I'm beginning to wonder if gremlins are real. It was working fine, I make one change and broken.

The kicker here is the fact that after my last post, I had changed the same thing on two other gw's and everything was still fine.

I'm at a loss. Is there, or are there, any random checks or verifies that would fubar if something was changed mid-state? ex. GW Up > GW Down > Config Change > fubar while expecting GW Up or Down?

[Edit Update]

So, while typing that, it seems to have fixed itself.?. However the crash still happened as stated. I feel like I'm in an Abbot & Castello routine at this point.

[crash log]
amd64
11.2-RELEASE-p2
FreeBSD 11.2-RELEASE-p2 #67 7bb6999b14a(RELENG_2_4_4): Wed Aug 15 10:04:36 EDT 2018 root@buildbot3:/builder/crossbuild-ce-master/obj/amd64/FWJoMRHc/builder/crossbuild-ce-master/pfSense/tmp/FreeBSD-src/sys/pfSense

Crash report details:

PHP Errors:
[15-Aug-2018 11:50:06 America/Los_Angeles] PHP Warning: touch(): Utime failed: No such file or directory in /etc/inc/gwlb.inc on line 1220

No FreeBSD crash data found.

Also available in: Atom PDF