Bug #2633
closedCaptive Portal timeouts cause users to be stuck in limbo
0%
Description
Hi Guys,
I run the internet service for a 350+ user student residence and I'm trying out the 2.1 snapshots.
Following up on some complaints from our users I've noticed that when timeouts are set in CP, when users reach the timeouts the system is unable to run the ipfw delete scripts and puts the users in limbo (not connected but not being redirected to the authentication screen either).
Here are some log entries I've found interesting:
Quote
Sep 13 22:05:41 php25109: : The command '/sbin/ipfw table 2 delete 10.10.2.170' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:05:41 php25109: : The command '/sbin/ipfw table 1 delete 10.10.2.170' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:05:41 php: : The command '/sbin/ipfw table 2 delete 10.10.2.143' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:04:40 php56526: : The command '/sbin/ipfw table 2 delete 10.10.1.145' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:04:40 php56526: : The command '/sbin/ipfw table 1 delete 10.10.1.145' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
And these ones:
Quote
Sep 13 23:51:11 php30142: : The command '/sbin/ipfw pipe 20187 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:11 php30142: : The command '/sbin/ipfw pipe 20186 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:10 php30142: : The command '/sbin/ipfw pipe 20003 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:10 php30142: : The command '/sbin/ipfw pipe 20002 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Any ideas on this?
Cheers,
Carlos
Files
Updated by Carlos Pereira about 12 years ago
After a lot of researching and poking through the code, I think I have identified the source of the problems.
It seems like it lies in the logic behind ipfw_context.
It may happen that another instance of the captive portal's pruning script runs and changes the context while another instance is running, causing the script to try to remove an nonexistent rule in that specific context.
Updated by Carlos Pereira about 12 years ago
The fix for this lies in /etc/rc.prunecaptiveportal
The script has to check not only for running instances of the same zone, it also needs to check for running instances of other zones.
If other zones are running, then it should abort and wait until there is an execution window.
Updated by Carlos Pereira about 12 years ago
- File rc.prunecaptiveportal rc.prunecaptiveportal added
Here's my suggested fix to it. I know it isn't pretty, but it helps
Updated by Carlos Pereira about 12 years ago
- File captiveportal.inc captiveportal.inc added
- File rc.prunecaptiveportal rc.prunecaptiveportal added
After combing through all the Captive Portal code and countless hours of testing, here's what I found:
- Due to the nature of the ipfw_context implementation, when running multiple captive portal zones at the same time, tasks such as login and prunning in one zone can be affected by / affect other zones. (i.e.: users logging in or being disconnected end up in limbo because the ipfw context was changed while adding/removing ipfw rules.
The way I decided to fix it was by reverting the execution lock logic back to what it was prior to the multi-zone captive portal implementation, applying one lock file for all zones. In addition to this, I've added a lock mechanism to the captiveportal_disconnect method to make sure that the disconnection occurs completely during prunning/manual disconnection.
Also, I've revised my previous fix to something more acceptable.
I would really appreciate if the devs could review this logic and apply it to the main trunk if it is an acceptable fix.
As a bonus, I've fixed another captive portal bug related to SSL certificates in different zones - the original code only allowed for one certificate.
Cheers,
Carlos
Updated by Carlos Pereira almost 12 years ago
Hi Ermal,
The fixes broke the captive portal entirely.
For one, DNS requests to the forwarder are completely blocked.
Two, once you open the dns requests through a firewall rule, the user is kept in an infinite login loop - seems to be the same issue as here: http://forum.pfsense.org/index.php/topic,56812.0.html
Thanks,
Carlos
Updated by Renato Botelho over 11 years ago
Changes were made on Captive Portal since last month, could you please try a recent snapshot?
Updated by Chris Buechler over 11 years ago
- Status changed from Feedback to Resolved