Bug #6803
openCSRF timeout occurs when it (probably) shouldn't
0%
Description
Expected behaviour and error
A feature/change in release 2.1 was that CSRF timeout was changed to be the same as session timeout [1]. When a CSRF timeout occurs, the user should be offered a "retry" option, in which a new CSRF token is generated; they can then continue without disruption.
But when CSRF fails during an AJAX call, these aren't respected and the result is a page hang or failure.
Example use-case
A file is being worked on using diag_edit.php. The session timeout is lengthy (many hours). However when the "edit file" page doesn't have any server interaction for a few (<< 8) hours, and the "save" button is clicked, a CSRF timeout error occurs which is not recoverable. No error message is produced and the page just "hangs"; however the AJAX "browse" button continues to work and other GUI pages can be opened in new tabs without re-entering username+pw, showing that the user session is also still active.
Work done to replicate and explore
diag_edit.php doesn't contain any handler for AJAX "fail" (a bug in itself), and the main symptom of this issue is hanging when one tries to save, so I made the following change to the AJAX handler in function saveFile(file):
complete: function(req) { var values = req.responseText.split("|"); $("#fileStatus").html(values[1]); }, error: function(req, text,err) { alert('AJAX ERROR DETECTED. RETURN VALUES:\n\nreq:\n ' + JSON.stringify(req) + '\n\ntext:\n ' + JSON.stringify(text) + '\n\nerr:\n ' + JSON.stringify(err)'); }, timeout: 5000
I then set user session timeout to ~ 3 days, loaded a file in diag_edit.php, and left the computer unattended for around 5 hours. Upon return I clicked "save" to save the file, and received an AJAX error notification showing that csrf_callback() had been called, confirming that the previous "hang" on saving was due to csrf timeout/token validation failure during the AJAX POST call. The return value was an entire web page of HTML.
I could still click "browse" in the diag_edit.php GUI and receive a full file listing (surely if one request isn't valid then the other shouldn't be, either?)
I could still right-click the menu/header and open other pages in new tabs (showing that the user session had not timed out and was still active).
Possible issues
1. CSRF TIMEOUT ISN'T SAME AS SESSION TIMEOUT - Although CSRF timeout should match session timeout, it doesn't always, and can time out long before the session ends. In this case the GUI page became non-responsive due to CSRF timeout while my session was still very much active.
2. SOME AJAX BASED PAGES ARE MISSING ERROR: HANDLERS AND TIMEOUTS IN THEIR AJAX CALLS - Some pages that use AJAX don't have error handling, so they'll appear to hang on failure. Presumably all ajax code should have some kind of error: handler, or the GUI code should have a default AJAX error: handler for cases when the page doesn't have one. Also ajax calls should generally have some kind of timeout so they aren't allowed to hang indefinitely if there's a problem.
3. csrf_callback() DOESN'T RETURN A USEFUL ERROR IF IT OCCURS DURING AN AJAX POST CALL - If there is an ajax POST request which fails, the GUI code doesn't handle it or display a useful error message.
4. CSRF "CLICK TO RETRY" MISHANDLES IF IT OCCURS DURING AN AJAX POST CALL - By default, if CSRF times out, the CSRF handler presents an option to click and retry the request. It returns the full html of a web page which has this effect. This works fine if the POST was a form submission. But if the POST was an AJAX request, then the return of a full HTML page offering a retry option is received as meaningless response data (not the expected data/format) so the user doesn't get an option to retry.
I'd like to fix this but not sure what fix is needed. I had tried to fix this previously (PR #2981) but this was rejected. I don't understand the PR comment "You don't want to change the CSRF timeout value based on the session timeout", in light of the statement in 2.1 release notes "Set CSRF Magic token timeout to be the same as the login expiration". Any guidance what would be needed to improve this area would be appreciated, and I would try to produce a patch for it if able, once I understand what's needed.
[1] https://doc.pfsense.org/index.php/2.1_New_Features_and_Changes
Updated by Stilez y over 8 years ago
One bug to be fixed at least - it's now clear why CSRF times out incorrectly:
In guiconfig.inc, csrf_startup() checks the value of $config['system']['webgui']['session_timeout'] - but $config isn't available at that point, and isn't made accessible via global $config either.
So "$timeout_minutes = isset($config['system']['webgui']['session_timeout']) ? ..." always fails.
Updated by Stilez y over 8 years ago
Jim Pingle - the fix above doesn't work, because $config only becomnes defined when authgui.inc is included, which is after that point
(guiconfig.inc creates csrf_startup() and then loads authgui.inc -> which loads auth.inc -> which loads config.gui.inc -> which creates $config.....)
Updated by Jim Pingle over 8 years ago
Yeah I see that now. I tested it earlier by setting $config['system']['webgui']['session_timeout']=2 at the top of the block and making sure it worked, but obviously that wasn't the best test :-)
Going to take some more thought then. Without any of the other variables defined at that point in time, it's not going to be easy to pull that.
No matter how things are changed to make it available, it still needs that var declared global though, so it's not a bad change.
Updated by Stilez y over 8 years ago
I was wondering about it too. I don't know if either of these ideas go anywhere.....
- The value set by csrf_conf("expires", XXX) is only defined and used in csrf_startup() and one piece of plain code where the time-check occurs - it's not incorporated into a shared secret or hash. So at a pinch, we could do some handwaving so that instead of setting csrf_conf("expires") with the timeout, csrf_startup() sets a short value (say 60 seconds), and also sets "deferred action" flag of some kind containing an & $pointer to the structure/array where the expiry timeout would have been written. When parse_config() is called, it notes the flag is set and sets the variable pointed to, to that value, overwriting the default. Basically, "there is a timeout and we'll grab its value when everything else is initialised". (This is also easily adaptable if multiple pages need their timeouts deferred simultaneously).
- Although csrf_magic is a package, it's been included directly into the codebase. So we could modify csrf_conf() ourselves to allow some kind of deferred setting of the value more transparently?
An easy way to do this would be to allow parse_config() to access $GLOBALS['csrf']; if $config is parsed and $GLOBAL['csrf'][] contains missing expiry items, then when parse_config() is next called, which should be almost immediately afterwards, the blank timeouts in $GLOBALS['csrf'][] are replaced by the current correct timeout.
- Maybe somewhat like a $pid file, have a read-only file which is updated when $config is parsed or changed, and which tracks the current value of the timeout (and any other early-use variables we might need). So although $config isn't accessible, we know its value as of the last time $config was parsed or modified which should be up to date.
Updated by Stilez y about 8 years ago
Got an update to this, Jim.
I just noticed that although I've configured my router (config->settings) to store 100 backup configs rather than 30, after restart the config cache still only holds 30 configs not 100 and old ones are being erased. That suggests that the early cache cleanup during boot which looks at $config for the cache count, is failing to respect the correct number of backups, I suspect like the above issue, because it's at a point where $config has not yet been created.
There may be other places which reference $config at a time it doesn't exist, and where user settings are ignored. That's 2 in as many weeks. So it looks like something that could be a generic problem affecting other things too, and may need scrutinising for, in the parts of the codebase possibly affected :(
Updated by Jim Thompson about 8 years ago
- Category set to Web Interface
- Assignee set to Anonymous