Feature #13843
openAdd ability to properly configure RADIUS captive portal user quotas of 4096MB or more
Added by Reid Linnemann almost 2 years ago. Updated about 1 month ago.
0%
Description
The current vendor attribute pfSense-Max-Total-Octets used for setting a user's traffic quota is a 32 bit unsigned integer that overflows for quotas in excess of (2^32 - 1) bytes. Resolving this limitation will require defining a new vendor specific attribute that either:
- Is of type string
- Has a resolution less granular than octets
- Provides an additional high word(s) to extend the existing attribute
The FreeRADIUS plugin must be modified to utilize this new attribute when setting total traffic quotas for the user as appropriate, either by direct selection or intuiting from the size of the input value.
The net/pear-Auth_RADIUS port will need to be patched further to decode the new attribute and place it in the returned attributes list.
The captive portal radius auth code will need to be altered to properly use the new attribute if it exists in a response, and populate the user's captive portal database entry with the appropriate 'traffic_quota' value.
Documentation in the captive portal and FreeRADIUS UIs should be modified to make note of the new attribute and how it is to be used so that users with RADIUS services on other hosts may be able to properly configure user records.
Related issues
Updated by Reid Linnemann almost 2 years ago
- Related to Regression #13823: RADIUS attribute pfSense-Max-Total-Octets is not parsed correctly added
Updated by Dale Harron almost 2 years ago
If the new 4095 GB limit set in the freeRadius user file edit/create code is related to this Feature, it is INCORRECTLY preventing larger traffic limits on pfSense freeRadius authenticated users as freeRadius handles the traffic quota itself as the authentication authority. The fact captive portal's total traffic quota is not set correctly does not matter as the accounting packets sent back to freeRadius based either on interim or Stop/Start contain incremental data usage that is cumulated in the /var/log/radacct/datacounter/???????/used-octets-user file for Stop/Start setting or used-octets-user-uniqueUserID file for interim setting. It is then compared by freeRadius to the max-octets-user file in the same directory and the user logout is initiated by freeRadius when used-octets-user exceeds max-octets-user, resulting in freeRadius requesting a user logout in the accounting packet and captive portal logs them out as I understand it. The ?? in the dir path is either forever, monthly, weekly or daily.
If the concern is that the captive portal log out a user when multi-simultaneous-users have cumulated too much traffic data in total, that is legitimate because the used-octets-user-uniqueUserID files under the captive portal freeRadius interim setting do not cumulate into the used-octets-user file until the user is logged out. In a perfect world, captive portal would log them all out when the quota was reached, forcing freeRadius to cumulate all the open used-octets-user-uniqueUserID files into the used-octets-user file which would trigger freeRadius requesting a logout in the next accounting packet. I believe this is why Stop/Start exists in the first place.
I should note that for daily, weekly, monthly accounts, the intent in freeRadius is that a cron routine resets the usage to 0 at the end of the respective periods. Only freeRadius will know that the used-octets-user is now zeroed so please ensure you accommodate this in each accounting packet as captive portal must also reset it's traffic counter to 0 at the same time.
As this will only potentially fix the interim setting (assuming it needs fixing at all?), as if I understand correctly, the Stop/Start setting does not experience this issue, I strongly suggest you change the 4095GB max restriction on freeRadius users to a warning or tie it to the interim vrs Stop/Start setting in the freeRadius captive portal setup page. I am not personally convinced it is the job of pfSense Captive Portal to enforce the traffic quota at all but if it did, it would permit the use of the interim option which scales better. Stop/Start every minute for a large number of users can fail to execute on time and I have seen it slip to 10 minutes or so on a heavily loaded system (>200 users for eg). If the accounting packets are delayed under Stop/Start incremental data limits are also restricted to 4085 GB due to the use of 32 bit unsigned integers for up/down incremental traffic amounts, it is possible it overflows and the used-octets-user? files will not contain the correct result if the delay become excessive.
The 4095 user data quota limit imposed in 23.01 is a disaster for many pfSense users and they have to manually override the max-octets-user file to continue to support larger data quotas. I am running 100% reliability & accuracy on my lightly loaded Lab system under 23.01 Feb 2nd release at 100 and 500 GB freeRadius traffic limits with no data loss or issues at all under Stop/Start freeRadius. Even under the interim setting, once a sufficient number of simultaneous users have logged out to increase the value in the used-octets-user file above the setting in the max-octets-user file, freeRadius, through captive portal, correctly logs out all users, one at a time after their next accounting packet. My fall back safey valve is a cron routine that simply sums the interim files to the main used-octets-user file every few hours. I only enable it if Stop/Start delays become excessive.
Updated by Dale Harron almost 2 years ago
Dale Harron I believe I can now explain most of this behavior.
redmine 13418 fixed an issue with freeRadius where freeRadius did not track data at all, it always returned 0's. This issue showed up in release 22.05. The fix to that implemented in late December 2022 was for Captive Portal to only track data used between interim packets, especially in the stop/start freeRadius setting. It was in implementing that fix that this 32 bit integer problem became redmine13843. Because the fix to 13418 resulted in only tracking the data consumed between reporting packets, it ignores the max-octets-user value in the packets and leaves it up to freeRadius to compare the cumulation of data consumed as represented in used-octest-user. Thus it works fine under Stop/Start freeRadius, as freeRadius likely does not use a 32 bit unsigned integer to do so, it compares the actual value in octets in those two respective files.
Now, once either interim updates, or re-authenticate every minute is selected, a different logic is applied to assessing total data consumed. pfSense/Captive Portal is/are actively managing the total data consumed and comparing it to the max-octets-total value in the packet, a value that overflows to a random, but consistent value when represented as a 32 bit unsigned integer. This is demonstrated by the fact 100 GB always terminates at 1.8 GB of total throughput. Now freeRadius still has that user authenticated as it just checks the relative max to used values in the above and the used value has not exceeded the max value yet so it has not sent a false authenticate request in the handshake packet yet. It is the captive Portal that initiated the logout in these cases and thus the freeRadius interim packet value for max-total-octets is used to disconnect/logout the user. That is why I can simply refresh my logout tab and the connection continues (releases the suspended animation above) as Captive Portal has reset its counters on the logout. It will work for another 1.8 GB and repeat until such time as the cumulative totals in used-octets-user has exceeded max-octets-user and freeRadius sends a "false" back in the interim packet for authentication and captive portal logs them out based on the freeRadius authentication status. At that point, refreshing my logout tab results in Captive Portal loading the login screen because freeRadius is the authenication authority.
The fix to this is for freeRadius authentication to dominate, do NOT use the max-octets-user value sent in the interim update packer in pfSense/Captive Portal , instead, do the same as you do for the stop/start freeRadius, send only the data consumed since the last accounting update packet. Let freeRadius cumulate that value in it's used-octets-user file and let it disconnect when the max-octet-user value is reached in freeRadius. As long as pfSense/captive Portal sticks to tracking from packet to packet and leaves the job of tracking the total data Quota to freeRadius, it will work perfectly.
This logic also fixes the interim update option, even for multiple simultaneous users on one user account. You will still have the issue of the users having to log out before the data is counted (i.e sumed into used-octets-user) but that has been there for years and is by design. My workaround of having a cron job do interim sums to used-octets-user was working fine for that in the past.
Please, Please, Please let freeRadius do it's job and get out of its way, this is not an issue with the unsigned integer, it is an issue because you take over tracking total data usage and make the decision to logout yourself while ignoring the fact freeRadius is still validating that user as authenticated. Your diagnosis of the 32 bit overflow is correct but your treatment is killing the patient.
Lastly, for the record, all updates to used-octets-user are done once a minute, independent of the freeRadius interim update setting (default 10 minutes). This is not an issue unless it creates excessive traffic on the interface. It is done once a minute independent of the reauthenticate every minute setting as well.
Updated by Dale Harron almost 2 years ago
In way of clarification, the used-octets-user or used-octets-user-uniqueID files are currently correctly updated with incremental data once a minute (should use the freeRadius incremental update value though; 600 seconds default). The problem that is related to 13843's 32 bit unsigned integer limitation in representing the max-octets-user value is that pfSense's support for freeRadius does not let freeRadius manage the summing of the incremental consumption values into the used-octets-user file and then compare it to max-octets-user on the freeRadius side. Instead, pfSense maintains its own total and compares it to the max-octets-user value sent in the packet, yes, the value that is wrong when above 4GB due to the 32 bit unsigned integer limitations. pfSense should simply not check that value, it is not it's job to manage this data quota, that is the job of freeRadius. Thus, freeRadius is not limited by the 4GB 32 bit issue in 13843, pfSense is and only because it is not minding it's own business and instead arbitrarily disconnecting the user before receiving a request to do so from freeRadius in the interim update packet.
I will cross post this observation to 13843. Please refer to Regression #13418 for the preamble to this observation.
Updated by Dale Harron almost 2 years ago
Perhaps it would help if I took a different perspective here:
You do not have to implement the following: "Documentation in the captive portal and FreeRADIUS UIs should be modified to make note of the new attribute and how it is to be used so that users with RADIUS services on other hosts may be able to properly configure user records." at this time because restricting the freeRadius created user to 4GB is not necessary with freeRadius itself AT THIS TIME. All it achieves is to make freeRadius unusable for past customers that rely on it for their livelihood.
By all means, implement this new attribute for support of other Radius applications but your GUI is specific to freeRadius and it doesn't experience this problem with this attribute, you (pfSense) do! When you decided in the fix to #13418 to also implement a captive portal initiated disconnect based on this value, you created the issue and suspending the implementation of that check/logout is a more practical solution that keeps 23.01 users in business while you debug the full implementation of max-octets-user on the pfSense side. Please remove the pfSense side freeRadis specific max-octets-user check and 4 GB restriction asap! Everything else works fine as is. It doesn't scale well because you reauthenticate and stop/start every minute but that is still better than having no usable freeRadius at all.
Updated by Reid Linnemann almost 2 years ago
I understand the concern here. I think until I can improve on the pfSense-Max-Total-Octets used for preemptive logout when not going reauth, we can just include a warning in the freeradius input rather than a limit. That should solve the problem with the artificial accounting limit in the radius server while still preventing foot shooting for users that don't want to use periodic reauth. To clarify, the captive portal only does periodic disconnects of users on its own if the captive portal is not configured for periodic reauth, otherwise it relies entirely on the auth failure to prune the captiveportal tables. I'll file another issue and link it here. You can circumvent the limitation by editing the config directly if you want, but the freeradius package will be updated shortly.
Updated by Dale Harron almost 2 years ago
If you are referring to periodic auth as both Reauth every minute checked and/or stop/start checked I have tested both today with today's 23.01RC 8 Feb. 100 GB quota in freeRadius. Stop/Start works as expected all the way to 100 GB with used-octet-user updated every minute. If Reauth every minute was on, the only change was that sylslog recorded the login every minute.
With the test run under "interim" instead, with or without Reauth every minute, captive portal logs the session out every 1.7 GB until 100 GB is reached. Again, Reauth every minute only impacted the syslog. The logout is confirmed in the cap portal auth log with no entry in syslog. On the bright side, it now respects the 600 sec default for the interim setting in freeRadius.
This means that interim is unusable. With stop/start sleep(1), it seems limited to max 60 / min resulting in a greater duration than 1 minute before updating used-octet-user for more than 60 or so users. We often have 200-400 logged in users and it can take many minutes and overflow the syslog so you can't actually tell if it is working. I have been using usleep(50000) with success today. We have a fast 12 core cpu and solid state drive.
If you used the same approach for interim w reauth every minute that you do for Stop/Start where there is no cap portal initiated quota related logout, it would provide a safe, working system again.
Thanks for getting back to me. I spent a lot of time looking through captiveportal.inc at the new code but I have not decoded the captive portal logout source yet. I was hoping to enable reauth every minute behaviour that matched that of stop/start.
Updated by Dale Harron almost 2 years ago
I believe I can finally put this project in perspective for all of us.
1. The reconciliation of the 32 bit unsigned int overflow so that pfSense knows the actual value of max-octets-total in freeRadius will enable you to track a "single session" against the correct pfSense-Max-Total-Octets in pfSense/Captive Portal. From that perspective this is a valid improvement.
2. The above achievement is flawed in that total traffic in pfSense/Captive Portal is only maintained for the active session or duration of pfSense powered up state. This means it tracks the traffic data on a single user session between session start and either logout or in this case a traffic quota based on a valid freeRadius supplied pfSense-Max-Total-Octets. A reboot or power loss will result in the pfSense Captive Portal session surviving if "Preserve connected users across reboot" is checked, i.e. True, but the traffic quota cumulated to date in pfSense will be lost on a reboot or power loss and restart from 0. This means pfSense is not capable of tracking a single user session against the freeRAdius max-octets-user value because the traffic to date value is not persistent. But... it is persistent to the nearest accounting interval on freeRadius because it uses a physical storage on a hard drive file, which survives both a login/out session, and reboot, be it deliberate or power related. freeRadius typically autosums all open user sessions to the main used-octets-user file on a restart. pfSense Captive Portal will force the creation of new used-octets-user-uniqueID files on freeRadius after the restart at the appropriate interim accounting interval.
3. If the captive portal is set to "Multiple" under Concurrent user logins, there will be multiple active sessions for the same user account under pfSense. Each separate session is checking that session's traffic against the pfSense-Max-Total-Octets value so it will not logout either user until that individual user reaches the pfSense-Max-Total-Octets value which is not the intent at freeRadius. On the freeRadius side, it will cumulate all users together into used-octets-user based either on stop/start or if set to interim, based on the interim interval value set on freeRadius, it will create a used-octets-user-uniqueID value for each user which are only summed to the main used-octets-user file when each individual user logs out. Although this is by design on the freeRadius side, it could be considered a bug because a logout will not be initiated by freeRadius until the used-octets-user exceeds the max-octets-user setting. Thus, neither pfSense, nor freeRadius will log all users out unless something intervenes and forces summation of all the used-octets-user-uniqueID files into the used-octets-user file that freeRadius uses to determine if the quota was breached.
4. Yes, this now accurate pfSense-Max-Total-Octets value tracked against a single session user for an uninterrupted session will force a captive portal traffic quota related logout, which in turn will force freeRadius to sum all the uniqueID files into the main used-octets-user file but it will not do so in captive portal: "multiple" Concurrent logins mode as it is tracking each individual user against pfSense-Max-Total-Octets instead of summing all the authenticated users on that user account together first. Thus "multiple" is not supported.
5. freeRadius has all it needs to do both multi users per account and session interruption tracking if pfSense was to force the summation of the uniqueID files into the used-octets-user file on a regular basis. This is why stop/start freeRadius solves this problem and works perfectly. Unfortunately that solution is not elegant and has serious scaling issues.
6. As we already have a working solution in stop/start all we need to do to get interim to use freeRadius to accurately invoke pfSense-Max-Total-Octets is to regularly fire one stop/start accounting packet session. I suggest when you select interim, you add a gui option to set a "cumulate every ??? seconds" option and point out it should be at least as large as the freeRadius interim setting (default 600 seconds). Once this is done, pfSense Captive portal in cooperation with freeRadius through the interim accounting period set on freeRadius will enforce the pfSense-Max-Total-Octets in all scenarios.
7. As I said in the outset, the correct determination of the pfSense-Max-Total-Octets is irrelevant other than if the above is implemented, it will fix the failure of either freeRadius or pfSense to accurately track pfSense-Max-Total-Octets when interim sessions are still active within this "powered session".
8. in the above scenario. reauthenticate every minute does not appear to have purpose other than to force freeRadius to check max-octets-user against used-octets-user once a minute. As it does NOT force the cumulation of the multiple used-octets-user-uniqueID files into the used-octets-user file, it is not a tool to address the multiple user under interim setting issue. If it did, it would also be a solution to the tracking of multiuser traffic.
The above assumptions have been lab confirmed on 23.01RC 08 Feb 06:00 version, in both single and multiuser mode.
Updated by Reid Linnemann almost 2 years ago
Let's keep the notes relevant to the issue topic, please. Your concerns about interim accounting overflowing uint32, multiple session accounting, etc. are valid but this issue is not the appropriate place for them to be addressed. I recommend you file a new issue for each individual concern so that they can be tracked and action taken.
On the topic of this issue, it's unfortunately necessary at this time to have the captive portal track usage against the RADIUS user quota when not periodically sending Access-Requests, as there is no RADIUS accounting mechanism for the revocation of the network resource upon exceeding quota. Exceeding accounting limits only results in a subsequent Access-Reject response to an Access-Request. In the future I would like to see a NAS daemon supporting rfc3576 disconnect messages to facilitate this, but that's an undertaking that won't be happening in the immediate future.
Updated by Dale Harron almost 2 years ago
From my perspective, the 32 bit overflow has broken the captive portal quota tracking with freeRadius that was working perfectly for many years; broken by the release of 22.05 with the data set to 0, fixed on Dec 29th 2002 under 13418, raised by me. If the fix of the unsigned 32 bit integer necessitates that we abandon freeRadius due to a 4 GB limit on data quotas in the mean time, then the entire product becomes unusable by people who have made a commitment to pfSense, for years, 15 in our case. It is critical that 23.01 be released without this 4GB traffic quota limit on freeRadius authenticated users. The current 23.01 RC release limit to 4 GB suggests it isn't going to be fixed. That 4 GB limit is being applied to the freeRadius GUI in pfSense, not to freeRadius itself, forcing abandonment of the GUI just to function at all. Captive Portal is logging the users out based on the 32 bit unsigned integer overflow value of what is set in pfSense-Max-Total-Octets, ignoring the fact freeRadius is providing a True response to the reauthentication check done. Once that reauthentication check returns a False, then the "revocation of the network resource" occurs properly as a captive portal initiated logout occurs. In multiuser case, then all other users will return a false on the reauthentication request and present an orderly revocation of all network resources in pfSense. The fact the interim update setting requires a logout of the user at the captive portal before freeRadius will invoke the summation of the temporary usage file to the used-octets-user file in freeRadius, resulting the the freeRadius side, max-octets-user limit being reached, triggering a false response to the next reauthentication request from captive portal is the "cause" that "effects" your concern about the revocation of the network resource by pfSense. The revocation never occurs if the user never logs out because freeRadius has not met its Quota requirements. You are implementing a band aid on the pfSense Captive Portal side that will track, non-persistent data from pfSense and revoke the network resource independent of respect for the actual consumption of data to date. freeRadius has a persistent method of tracking this information. All you need do is invoke a Stop/Start to flush (cumulate) the interim quota files at freeRadius to the used-octets-user file and freeRadius will accurately and consistently send a false on your next reauthentication request.
If you are going to place any limitation on the GUI to freeRadius, force an idle timeout minimum so that users that leave without logging out are automatically logged out by freeRadius timing out in an accounting interval. That "timed out" logout will force the cumulation to used-octets-user, in turn correctly invoking a reauthenticate false return on the next accounting interval and cleanly, accurately taking care of business as far as network resource revocation is concerned.
I do understand your concern and focus but I am forced to deal with a wider perspective that includes the result set of the impact of this solution. I simply want the patient to survive the treatment. There is no work to do, no more redmine tickets if you respect how freeRadius works and let it do it's job. I believe the correction of the unsigned 32 bit integer overflow and tracking of the max-octets-total value sent to and tracked by pfSense as pfSense-Max-Total-Octets is a valid and desirable safety valve to manage network resource revocation going forward. It is a LOW priority problem if you trust freeRadius. If you deem that network resource revocation is a higher priority problem and 23.01 comes out with a forced Quota max limit of 4 GB, only then are new redmine issues necessary. The status quo of the historic long term interface requires no action at all from me, only the result of the resolution of this Redmine does.
I am absolutely and completely on topic, desperately trying to get though to you as to the real world consequences of the current status of ticket.
Updated by Jim Pingle over 1 year ago
- Plus Target Version changed from 23.05 to 23.09
Updated by Jim Pingle about 1 year ago
- Plus Target Version changed from 23.09 to 24.01
Updated by Jim Pingle about 1 year ago
- Plus Target Version changed from 24.01 to 24.03
Updated by Dale Harron 10 months ago
When implementing this feature, please support multi-user logins, including parallel user sessions that have been started and terminated prior to the reaching of the quota. This requires using a running total to compare against the quota set in either captive portal or freeRadius as applicable. freeRadius is currently correctly cumulating multiuser data throughput in the disk file but not tracking cumulative time for multiusers and captive portal assumes time is for the account independent of multiuser logins. freeRadius is not limited by the captive portal 4096 constraint.
Updated by Jim Pingle 9 months ago
- Plus Target Version changed from 24.03 to 24.07
Updated by Jim Pingle 6 months ago
- Plus Target Version changed from 24.07 to 24.08
Updated by Jim Pingle about 2 months ago
- Plus Target Version changed from 24.08 to 24.11
Updated by Jim Pingle about 1 month ago
- Plus Target Version changed from 24.11 to 25.01