对于xsd：token和xsd：string，验证完全相同的字符串集的正则表达式是什么？(What is the regular expression for the set of strings t

编程入门行业动态更新时间:2024-10-14 12:23:29

对于xsd：token和xsd：string，验证完全相同的字符串集的正则表达式是什么？(What is the regular expression for the set of strings that validate exactly the same for xsd:token and xsd:string?)

我想编写一个XSD来限制xsd：token类型的有效XML元素的内容，这样在验证时它们将与xsd：string中包含的相同内容无法区分。

即它们不包含回车符（#xD），换行符（#xA）和制表符（＃x9）字符，以空格（＃x20）字符开头或结尾，并且不包含两个或更多相邻的序列空间人物。

我认为使用正则表达式是这样的：

\S+( \S+)*

（一些非空格，可选[一个或多个非空格旁边的单个空格]，包括总是要关闭的非空格）

这适用于各种正则表达式测试工具，但我似乎无法使用oXygen XML Editor检查它; 字符串中的双空格，前导和尾随空格，制表符和换行符似乎允许XML实例仍然通过验证。

这是XSD实现：

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:token"> <xs:maxLength value="500"/> <xs:minLength value="1"/> <xs:pattern value="\S+( \S+)*"/> </xs:restriction> </xs:simpleType>

是否有一些功能

XML

要么

XSD

要么

oXygen XML编辑器

这阻止了这个工作？

I want write an XSD to restrict the content of valid XML elements of type xsd:token such that at validation they would indistinguishable from the same content wrapped in xsd:string.

I.e. they do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, begin or end with a space (#x20) character, and do not include a sequence of two or more adjacent space characters.

I think the regular expression to use is this:

\S+( \S+)*

(some non-whitespace, optional [single spaces next to one or more non-whitespaces], including always non-whitespace to close out)

This works with various regex testing tools but I can't seem to check it using oXygen XML Editor; double spaces, leading and trailing spaces, tabs, and line breaks in the strings seem to allow the XML instance to still pass validation.

Here's the XSD implementation:

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:token"> <xs:maxLength value="500"/> <xs:minLength value="1"/> <xs:pattern value="\S+( \S+)*"/> </xs:restriction> </xs:simpleType>

Is there some feature of

XML

XSD

oXygen XML Editor

that prevents this working?

最满意答案

您的原始([^\s])+( [^\s]+)*([^\s])*正则表达式包含一些冗余模式：它匹配并捕获1 +非空格的每次迭代，然后匹配0+空间序列和1+非空格，然后再次尝试匹配和捕获非空白的每次迭代。

您可以使用类似但更短的

\S+( \S+)*

由于XML Schema正则表达式默认是锚定的，因此表达式匹配：

\S+ - 除了空格之外的一个或多个字符，特别是 （空格）， \t （制表符）， \n （换行符）和\r \n （返回） ( \S+)* - 零个或多个空格序列和1+个空格。

此表达式不允许重复的连续空格，并且在前导/尾随位置没有空格。

以下是应该如何使用正则表达式：

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:string"> <xs:pattern value="\S+( \S+)*"/> <xs:maxLength value="500"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType>

Your original ([^\s])+( [^\s]+)*([^\s])* regex contains some redundant patterns: it matches and captures each iteration of 1+ non-whitespaces, then matches 0+ sequences of space and 1+ non-whitespaces, and then again tries to match and capture each iteration of a non-whitespace.

You may use a similar, but shorter

\S+( \S+)*

Since XML Schema regex is anchored by default, there expression matches:

\S+ - one or more chars other than whitespace, specifically  (space), \t (tab), \n (newline) and \r (return) ( \S+)* - zero or more sequences of a space and 1+ whitespaces.

This expression disallows duplicate consecutive spaces and no spaces at leading/trailing position.

Here is how the regex should be used:

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:string"> <xs:pattern value="\S+( \S+)*"/> <xs:maxLength value="500"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType>

更多推荐

本文发布于:2023-08-04 11:48:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1413938.html