Project

General

Profile

Actions

Bug #16745

open

Special characters are escaped as HTML entities then escaped again

Added by Manuel Carrera 1 day ago. Updated about 7 hours ago.

Status:
Pull Request Review
Priority:
Normal
Assignee:
Category:
Backup / Restore
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Plus Target Version:
26.03
Release Notes:
Force Exclusion
Affected Version:
2.9.0
Affected Architecture:

Description

Hello, I had to upgrade to the latest Beta of pfSense Plus to fix another bug, and this has completely broke the descriptions of my configuration.

I write descriptions in my language with special characters (é, è, à...) and after the upgrade they are displayed as HTML entites in the GUI. For example...

DNS sécurisé

... becomes ...
DNS sécurisé

I made a backup and checked how this description is stored:

<descr><![CDATA[DNS s&amp;eacute;curis&amp;eacute;]]></descr>

I tried to fix the encoding in the backup (replace all "& amp;eacute;" with "& eacute;") and do a restore, but the descriptions are broken again immediately after the restore. If I do another backup, all "& eacute;" have reverted back to "& amp;eacute;".

I tried to upgrade again to the latest version 26.03 RC, but the result is exactly the same.

Actions #1

Updated by Manuel Carrera 1 day ago

I tried to rewrite the description from the GUI and it now displays correctly.

I made a backup to check how it's stored:

<descr><![CDATA[DNS sécurisé]]></descr>

No HTML entity, just the special character directly in the file. Which seems more correct since the description uses <![CDATA[]]>.

So I think the GUI expects the characters unescaped, but the upgrade and restore processes escapes all descriptions.

Actions #2

Updated by Jim Pingle about 14 hours ago

  • Project changed from pfSense Plus to pfSense
  • Category changed from Backup / Restore to Backup / Restore
  • Assignee set to Marcos M
  • Affected Plus Version deleted (26.03)

This seems like it may be from the fixes put in for #16661

What specific areas of the configuration were affected? Parts of the changes for #16661 specifically were to avoid double encoding.

Actions #3

Updated by Manuel Carrera about 12 hours ago

Jim Pingle wrote in #note-2:

What specific areas of the configuration were affected? Parts of the changes for #16661 specifically were to avoid double encoding.

By searching for double escaped HTML entities in my backups:
  • Descriptions of static mapping on the DHCP server (IPv4)
  • Descriptions of NAT rules
  • Descriptions of firewall rules
  • Texts of separators in firewall rules
  • Descriptions of aliases
  • Descriptions of individual items in aliases
  • Descriptions of Wake-on-LAN entries
  • Titles of picture widgets
  • Descriptions of VLANs
  • Descriptions of gateways
  • Descriptions of gateway groups
  • Descriptions of items in "pfBlockerNG > IP > IPv4"
  • Descriptions of items in "pfBlockerNG > IP > IPv6"

This list is probably not exhaustive, I may not have used special characters everywhere or used every features of pfSense.

But there is a very clear pattern: It's everything that uses "<![CDATA[]]>" sections to store text.

To me it seems you manage special characters in the configuration with 2 different methods that conflicts with each other:
  • Encapsulate text in "<![CDATA[]]>" so special characters becomes simple data
  • Escape special characters as HTML entities
I suppose the solution is to commit to only one method:
  • Either you remove the "<![CDATA[]]>" sections, so special characters are now replaced in the configuration for a good reason, and parsed correctly by the GUI
  • Or you make sure that when texts are stored in the configuration, any text stored in "<![CDATA[]]>" sections is not escaped, or maybe unescaped before being stored, to ensure the text content doesn't get corrupted
Actions #4

Updated by Manuel Carrera about 11 hours ago

Admittedly I'm a C# dev, not a PHP web developer so I may simply be wrong...

But I looked at changes for #16661, specifically commit:16d57928 and something looks wrong to me. :/ The following is done 2 times:

if (is_cdata_entity($ent)) {
    $xmlconfig .= "<{$ent}><![CDATA[" . htmlentities($val, ENT_QUOTES | ENT_SUBSTITUTE | ENT_XML1) . "]]></{$ent}>\n";
} else {
    $xmlconfig .= "<{$ent}>" . htmlentities($val, ENT_QUOTES | ENT_SUBSTITUTE | ENT_XML1) . "</{$ent}>\n";
}

Based on https://developer.mozilla.org/en-US/docs/Web/API/CDATASection -> In the first part of the IF you put the value in a "<![CDATA[]]>" section. So shouldn't you be only trying to escape the closing part instead of using "htmlentities"? Like replacing "]]>" with "]] >" or something like that?

Actions #5

Updated by Marcos M about 7 hours ago

  • Tracker changed from Regression to Bug
  • Status changed from New to Pull Request Review
  • Target version set to 2.9.0
  • Plus Target Version set to 26.03
  • Release Notes changed from Default to Force Exclusion
  • Affected Version set to 2.9.0

MR: https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1275

The previous ENT_HTML401-encoded values need to be encoded as ENT_XML1. This can be done during config revision upgrades. The following PHP code can be run at Diagnostics > Command Prompt which should fix the old values.

function migrate_config_encoding_244_to_245(&$array) {
    foreach ($array as &$value) {
        if (is_array($value)) {
            migrate_config_encoding_244_to_245($value);
        } elseif (is_string($value)) {
            $value = html_entity_decode($value, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401);
        }
    }
}

$full_config = config_get_path('', []);
migrate_config_encoding_244_to_245($full_config);
config_set_path('', $full_config);
write_config('Migrate XML encoding');
Actions #6

Updated by Manuel Carrera about 7 hours ago

Marcos M wrote in #note-5:

The following PHP code can be run at Diagnostics > Command Prompt which should fix the old values.

[...]

This code worked like a charm, my configuration file is clean again. Thanks!

Actions

Also available in: Atom PDF