Bug #16745
openSpecial characters are escaped as HTML entities then escaped again
0%
Description
Hello, I had to upgrade to the latest Beta of pfSense Plus to fix another bug, and this has completely broke the descriptions of my configuration.
I write descriptions in my language with special characters (é, è, à...) and after the upgrade they are displayed as HTML entites in the GUI. For example...
DNS sécurisé
... becomes ...
DNS sécurisé
I made a backup and checked how this description is stored:
<descr><![CDATA[DNS s&eacute;curis&eacute;]]></descr>
I tried to fix the encoding in the backup (replace all "& amp;eacute;" with "& eacute;") and do a restore, but the descriptions are broken again immediately after the restore. If I do another backup, all "& eacute;" have reverted back to "& amp;eacute;".
I tried to upgrade again to the latest version 26.03 RC, but the result is exactly the same.
Updated by Manuel Carrera 1 day ago
I tried to rewrite the description from the GUI and it now displays correctly.
I made a backup to check how it's stored:
<descr><![CDATA[DNS sécurisé]]></descr>
No HTML entity, just the special character directly in the file. Which seems more correct since the description uses <![CDATA[]]>.
So I think the GUI expects the characters unescaped, but the upgrade and restore processes escapes all descriptions.
Updated by Jim Pingle about 12 hours ago
- Project changed from pfSense Plus to pfSense
- Category changed from Backup / Restore to Backup / Restore
- Assignee set to Marcos M
- Affected Plus Version deleted (
26.03)
Updated by Manuel Carrera about 10 hours ago
Jim Pingle wrote in #note-2:
By searching for double escaped HTML entities in my backups:What specific areas of the configuration were affected? Parts of the changes for #16661 specifically were to avoid double encoding.
- Descriptions of static mapping on the DHCP server (IPv4)
- Descriptions of NAT rules
- Descriptions of firewall rules
- Texts of separators in firewall rules
- Descriptions of aliases
- Descriptions of individual items in aliases
- Descriptions of Wake-on-LAN entries
- Titles of picture widgets
- Descriptions of VLANs
- Descriptions of gateways
- Descriptions of gateway groups
- Descriptions of items in "pfBlockerNG > IP > IPv4"
- Descriptions of items in "pfBlockerNG > IP > IPv6"
This list is probably not exhaustive, I may not have used special characters everywhere or used every features of pfSense.
But there is a very clear pattern: It's everything that uses "<![CDATA[]]>" sections to store text.
To me it seems you manage special characters in the configuration with 2 different methods that conflicts with each other:- Encapsulate text in "<![CDATA[]]>" so special characters becomes simple data
- Escape special characters as HTML entities
- Either you remove the "<![CDATA[]]>" sections, so special characters are now replaced in the configuration for a good reason, and parsed correctly by the GUI
- Or you make sure that when texts are stored in the configuration, any text stored in "<![CDATA[]]>" sections is not escaped, or maybe unescaped before being stored, to ensure the text content doesn't get corrupted
Updated by Manuel Carrera about 9 hours ago
Admittedly I'm a C# dev, not a PHP web developer so I may simply be wrong...
But I looked at changes for #16661, specifically commit:16d57928 and something looks wrong to me. :/ The following is done 2 times:
if (is_cdata_entity($ent)) {
$xmlconfig .= "<{$ent}><![CDATA[" . htmlentities($val, ENT_QUOTES | ENT_SUBSTITUTE | ENT_XML1) . "]]></{$ent}>\n";
} else {
$xmlconfig .= "<{$ent}>" . htmlentities($val, ENT_QUOTES | ENT_SUBSTITUTE | ENT_XML1) . "</{$ent}>\n";
}
Based on https://developer.mozilla.org/en-US/docs/Web/API/CDATASection -> In the first part of the IF you put the value in a "<![CDATA[]]>" section. So shouldn't you be only trying to escape the closing part instead of using "htmlentities"? Like replacing "]]>" with "]] >" or something like that?
Updated by Marcos M about 6 hours ago
- Tracker changed from Regression to Bug
- Status changed from New to Pull Request Review
- Target version set to 2.9.0
- Plus Target Version set to 26.03
- Release Notes changed from Default to Force Exclusion
- Affected Version set to 2.9.0
MR: https://gitlab.netgate.com/pfSense/pfSense/-/merge_requests/1275
The previous ENT_HTML401-encoded values need to be encoded as ENT_XML1. This can be done during config revision upgrades. The following PHP code can be run at Diagnostics > Command Prompt which should fix the old values.
function migrate_config_encoding_244_to_245(&$array) {
foreach ($array as &$value) {
if (is_array($value)) {
migrate_config_encoding_244_to_245($value);
} elseif (is_string($value)) {
$value = html_entity_decode($value, ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401);
}
}
}
$full_config = config_get_path('', []);
migrate_config_encoding_244_to_245($full_config);
config_set_path('', $full_config);
write_config('Migrate XML encoding');
Updated by Manuel Carrera about 5 hours ago
Marcos M wrote in #note-5:
The following PHP code can be run at Diagnostics > Command Prompt which should fix the old values.
[...]
This code worked like a charm, my configuration file is clean again. Thanks!