CriticalXML · .xml
XML Encoding Mismatch — Declaration vs Actual Encoding
An XML file that declares UTF-8 encoding in its header but actually contains ISO-8859-1 (Latin-1) encoded characters, causing parser failures on non-ASCII characters.
Why It Fails
XML parsers trust the encoding declaration and attempt to decode the byte stream accordingly. When the actual encoding differs, multi-byte sequences become invalid, causing parse errors at the first non-ASCII character.
Broken Example
<?xml version="1.0" encoding="UTF-8"?> <data> <!-- File is actually ISO-8859-1 encoded --> <city>Zürich</city> <name>José García</name> </data>
Expected Error Behavior
Parser throws 'invalid byte sequence' or 'not well-formed' error. Characters like ö, ñ, é appear as garbage or cause crashes.
Affected Software
libxml2Java SAX/DOMPython lxmlPHP SimpleXMLC# XmlDocument
How to Fix
Detect actual encoding with chardet or file command. Convert file to match declared encoding. Use encoding='auto' where supported.