XML从UnicodeString创建CData节点时的无效字符

编程入门行业动态更新时间:2024-10-28 02:23:19

本文介绍了XML从UnicodeString创建CData节点时的无效字符的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

IDE：Embarcadero XE5 c ++ builder。

IDE: Embarcadero XE5 c++ builder.

我试图在 XML CData section 中转储 UnicodeStrings

这样一个字符串的小提取：

Small extract of such a string:

u"‰PNG\r\n\x1A\n\0\0\0\rIHDR\0\0\0õ\0\0\02\b\x06\0\0\0„\\i\0\0\0\x01sRGB\0®Î\x1Cé\0\0\0\x04gAMA\0\0±\vüa\x05\0\0\0\tpHYs\0\0\x0EÃ\0\0\x0EÃ\x01Ço¨d\0\0\v¼IDATxÚíœypUÕ\x19ÀO\x06…°¤\x04D$ˆ²\b1š\b\x18@...etc"

我知道一个XML文档可以包含非ASCII字符，我认为XML CData节的内容不是由XML解析器解析的（除了结束节指示符 [[> ，我的数据中不存在，检查它）。

I know a XML document can contain non-ASCII characters and I thought the content of a XML CData section is not parsed by the XML parser( with the exception of the end-of-section indicator "[[>", which is not present in my data, checked for it ).

当创建（写入）CData部分时，我仍然得到在文本内容中找到无效字符创建节点错误。

When creating(writing) a CData section, I'm still getting the "an invalid character was found in text content when creating node" error.

代码示例：

_di_IXMLDocument pXMLDocument = NewXMLDocument("1.0"); // I've played around with the document encoding with no success, guessing it's only applicable while reading the document. // pXMLDocument->SetEncoding(L"iso-8859-1"); String myString; // Unicode, contains my data string. // 1st param of CreateNode method is of type UnicodeString. di_IXMLNode pCDataNode = pXMLDocument->CreateNode( myString, ntCData );

有什么想法，为什么这是失败？编码问题？

Any thoughts on why this is failing? Encoding problem?

推荐答案

如果您阅读第2.7节 XML规范，它描述了CDATA节的格式：

If you read Section 2.7 of the XML specification, it describes the format of a CDATA section:

CDATA Sections [18] CDSect ::= CDStart CData CDEnd [19] CDStart ::= '<![CDATA[' [20] CData ::= (Char* - (Char* ']]>' Char*)) [21] CDEnd ::= ']]>'

Char 在第2.2节：

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

如果查看原始数据，它包含十几个字符值，该范围（特别是＃x0 ，＃x1 ，＃x2 ，＃x4 ，＃x5 ，＃x6 ＃x8 ， #xB #xE ，＃x18 ，＃x19 ，＃x1A c>＃x1C ）。这是为什么你得到关于非法字符的错误，因为你真的有非法字符。

If you look at your raw data, it contains over a dozen character values that are excluded from that range (specifically #x0, #x1, #x2, #x4, #x5, #x6, #x8, #xB #xE, #x18, #x19, #x1A, and #x1C). That is why you are getting errors about illegal characters, because you really do have illegal characters.

CDATA部分不允许把任意二进制数据到XML数据。 CDATA段意在在文本内容包含通常为XML标记保留的字符时使用，以便它们不必被转义或编码为实体。将二进制数据放入XML文档的唯一方法是以XML兼容（通常为7位ASCII）格式（例如Base64）编码（但还有其他可用的格式，例如yEnc）。

A CDATA section does not give you permission to put arbitrary binary data into an XML data. A CDATA section is meant to be used when text content contains characters that are normally reserved for XML markup, so that they do not have to be escaped or encoded as entities. The only way to put binary data into an XML document is to encode it in an XML-compatible (typically 7bit ASCII) format, such as Base64 (but there are other formats available that you can use, such as yEnc).

更多推荐

XML从UnicodeString创建CData节点时的无效字符

本文发布于:2023-10-29 12:52:52，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1539740.html