Project

General

Profile

Actions

Bug #2633

closed

Captive Portal timeouts cause users to be stuck in limbo

Added by Carlos Pereira over 11 years ago. Updated over 10 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Ermal Luçi
Category:
Captive Portal
Target version:
Start date:
09/15/2012
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
Release Notes:
Affected Version:
2.1
Affected Architecture:
amd64

Description

Hi Guys,

I run the internet service for a 350+ user student residence and I'm trying out the 2.1 snapshots.
Following up on some complaints from our users I've noticed that when timeouts are set in CP, when users reach the timeouts the system is unable to run the ipfw delete scripts and puts the users in limbo (not connected but not being redirected to the authentication screen either).

Here are some log entries I've found interesting:
Quote
Sep 13 22:05:41 php25109: : The command '/sbin/ipfw table 2 delete 10.10.2.170' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:05:41 php25109: : The command '/sbin/ipfw table 1 delete 10.10.2.170' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:05:41 php: : The command '/sbin/ipfw table 2 delete 10.10.2.143' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:04:40 php56526: : The command '/sbin/ipfw table 2 delete 10.10.1.145' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'
Sep 13 22:04:40 php56526: : The command '/sbin/ipfw table 1 delete 10.10.1.145' returned exit code '71', the output was 'ipfw: setsockopt(IP_FW_TABLE_DEL): No such process'

And these ones:

Quote
Sep 13 23:51:11 php30142: : The command '/sbin/ipfw pipe 20187 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:11 php30142: : The command '/sbin/ipfw pipe 20186 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:10 php30142: : The command '/sbin/ipfw pipe 20003 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'
Sep 13 23:51:10 php30142: : The command '/sbin/ipfw pipe 20002 delete' returned exit code '1', the output was 'ipfw: rule 1: setsockopt(IP_DUMMYNET_DEL): Invalid argument'

Any ideas on this?

Cheers,

Carlos


Files

rc.prunecaptiveportal (2.09 KB) rc.prunecaptiveportal Carlos Pereira, 09/19/2012 11:58 PM
captiveportal.inc (70.1 KB) captiveportal.inc Captive Portal Script (captiveportal.inc) Carlos Pereira, 10/22/2012 06:07 PM
rc.prunecaptiveportal (2.08 KB) rc.prunecaptiveportal Captive Portal Prunner (rc.prunecaptiveportal) Carlos Pereira, 10/22/2012 06:07 PM
Actions #1

Updated by Carlos Pereira over 11 years ago

After a lot of researching and poking through the code, I think I have identified the source of the problems.
It seems like it lies in the logic behind ipfw_context.

It may happen that another instance of the captive portal's pruning script runs and changes the context while another instance is running, causing the script to try to remove an nonexistent rule in that specific context.

Actions #2

Updated by Carlos Pereira over 11 years ago

The fix for this lies in /etc/rc.prunecaptiveportal

The script has to check not only for running instances of the same zone, it also needs to check for running instances of other zones.
If other zones are running, then it should abort and wait until there is an execution window.

Actions #3

Updated by Carlos Pereira over 11 years ago

Here's my suggested fix to it. I know it isn't pretty, but it helps

Actions #4

Updated by Carlos Pereira over 11 years ago

After combing through all the Captive Portal code and countless hours of testing, here's what I found:
- Due to the nature of the ipfw_context implementation, when running multiple captive portal zones at the same time, tasks such as login and prunning in one zone can be affected by / affect other zones. (i.e.: users logging in or being disconnected end up in limbo because the ipfw context was changed while adding/removing ipfw rules.

The way I decided to fix it was by reverting the execution lock logic back to what it was prior to the multi-zone captive portal implementation, applying one lock file for all zones. In addition to this, I've added a lock mechanism to the captiveportal_disconnect method to make sure that the disconnection occurs completely during prunning/manual disconnection.
Also, I've revised my previous fix to something more acceptable.

I would really appreciate if the devs could review this logic and apply it to the main trunk if it is an acceptable fix.

As a bonus, I've fixed another captive portal bug related to SSL certificates in different zones - the original code only allowed for one certificate.

Cheers,

Carlos

Actions #5

Updated by Ermal Luçi over 11 years ago

  • Assignee set to Ermal Luçi
Actions #6

Updated by Ermal Luçi about 11 years ago

  • Status changed from New to Feedback

Test later snapshots.

Actions #7

Updated by Carlos Pereira about 11 years ago

Hi Ermal,

The fixes broke the captive portal entirely.
For one, DNS requests to the forwarder are completely blocked.
Two, once you open the dns requests through a firewall rule, the user is kept in an infinite login loop - seems to be the same issue as here: http://forum.pfsense.org/index.php/topic,56812.0.html

Thanks,

Carlos

Actions #8

Updated by Renato Botelho about 11 years ago

Changes were made on Captive Portal since last month, could you please try a recent snapshot?

Actions #9

Updated by Chris Buechler over 10 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF