Bug #2633: Captive Portal timeouts cause users to be stuck in limbo - pfSense - pfSense bugtracker

Actions

Copy link

Bug #2633

closed

Captive Portal timeouts cause users to be stuck in limbo

Added by Carlos Pereira over 12 years ago. Updated over 11 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Ermal Luçi

Category:

Captive Portal

Target version:

2.1

Start date:

09/15/2012

Due date:

% Done:

Estimated time:

Plus Target Version:

Release Notes:

Affected Version:

2.1

Affected Architecture:

amd64

Description

Hi Guys,

I run the internet service for a 350+ user student residence and I'm trying out the 2.1 snapshots.
Following up on some complaints from our users I've noticed that when timeouts are set in CP, when users reach the timeouts the system is unable to run the ipfw delete scripts and puts the users in limbo (not connected but not being redirected to the authentication screen either).

Here are some log entries I've found interesting:
Quote
Sep 13 22:05:41 php²⁵¹⁰⁹: : The command '/sbin/ipfw table 2 delete 10.10.2.170' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:05:41 php²⁵¹⁰⁹: : The command '/sbin/ipfw table 1 delete 10.10.2.170' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:05:41 php: : The command '/sbin/ipfw table 2 delete 10.10.2.143' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:04:40 php⁵⁶⁵²⁶: : The command '/sbin/ipfw table 2 delete 10.10.1.145' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:04:40 php⁵⁶⁵²⁶: : The command '/sbin/ipfw table 1 delete 10.10.1.145' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'

And these ones:

Quote
Sep 13 23:51:11 php³⁰¹⁴²: : The command '/sbin/ipfw pipe 20187 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:11 php³⁰¹⁴²: : The command '/sbin/ipfw pipe 20186 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:10 php³⁰¹⁴²: : The command '/sbin/ipfw pipe 20003 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:10 php³⁰¹⁴²: : The command '/sbin/ipfw pipe 20002 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'

Any ideas on this?

Cheers,

Carlos

Files

Download all files

rc.prunecaptiveportal (2.09 KB) rc.prunecaptiveportal		Carlos Pereira, 09/19/2012 11:58 PM
captiveportal.inc (70.1 KB) captiveportal.inc	Captive Portal Script (captiveportal.inc)	Carlos Pereira, 10/22/2012 06:07 PM
rc.prunecaptiveportal (2.08 KB) rc.prunecaptiveportal	Captive Portal Prunner (rc.prunecaptiveportal)	Carlos Pereira, 10/22/2012 06:07 PM

Actions

Copy link

Updated by Carlos Pereira over 12 years ago

After a lot of researching and poking through the code, I think I have identified the source of the problems.
It seems like it lies in the logic behind ipfw_context.

It may happen that another instance of the captive portal's pruning script runs and changes the context while another instance is running, causing the script to try to remove an nonexistent rule in that specific context.

Actions

Copy link

Updated by Carlos Pereira over 12 years ago

The fix for this lies in /etc/rc.prunecaptiveportal

The script has to check not only for running instances of the same zone, it also needs to check for running instances of other zones.
If other zones are running, then it should abort and wait until there is an execution window.

Actions

Copy link

Updated by Carlos Pereira over 12 years ago

File rc.prunecaptiveportal rc.prunecaptiveportal added

Here's my suggested fix to it. I know it isn't pretty, but it helps

Actions

Copy link

Updated by Carlos Pereira over 12 years ago

File captiveportal.inc captiveportal.inc added
File rc.prunecaptiveportal rc.prunecaptiveportal added

After combing through all the Captive Portal code and countless hours of testing, here's what I found:
- Due to the nature of the ipfw_context implementation, when running multiple captive portal zones at the same time, tasks such as login and prunning in one zone can be affected by / affect other zones. (i.e.: users logging in or being disconnected end up in limbo because the ipfw context was changed while adding/removing ipfw rules.

The way I decided to fix it was by reverting the execution lock logic back to what it was prior to the multi-zone captive portal implementation, applying one lock file for all zones. In addition to this, I've added a lock mechanism to the captiveportal_disconnect method to make sure that the disconnection occurs completely during prunning/manual disconnection.
Also, I've revised my previous fix to something more acceptable.

I would really appreciate if the devs could review this logic and apply it to the main trunk if it is an acceptable fix.

As a bonus, I've fixed another captive portal bug related to SSL certificates in different zones - the original code only allowed for one certificate.

Cheers,

Carlos

Actions

Copy link

Updated by Ermal Luçi over 12 years ago

Assignee set to Ermal Luçi

Actions

Copy link

Updated by Ermal Luçi about 12 years ago

Status changed from New to Feedback

Test later snapshots.

Actions

Copy link

Updated by Carlos Pereira about 12 years ago

Hi Ermal,

The fixes broke the captive portal entirely.
For one, DNS requests to the forwarder are completely blocked.
Two, once you open the dns requests through a firewall rule, the user is kept in an infinite login loop - seems to be the same issue as here: http://forum.pfsense.org/index.php/topic,56812.0.html

Thanks,

Carlos

Actions

Copy link