对于xsd:token和xsd:string,验证完全相同的字符串集的正则表达式是什么?(What is the regular expression for the set of strings t

编程入门 行业动态 更新时间:2024-10-14 12:23:29
对于xsd:token和xsd:string,验证完全相同的字符串集的正则表达式是什么?(What is the regular expression for the set of strings that validate exactly the same for xsd:token and xsd:string?)

我想编写一个XSD来限制xsd:token类型的有效XML元素的内容,这样在验证时它们将与xsd:string中包含的相同内容无法区分。

即它们不包含回车符(#xD),换行符(#xA)和制表符(#x9)字符,以空格(#x20)字符开头或结尾,并且不包含两个或更多相邻的序列空间人物。

我认为使用正则表达式是这样的:

\S+( \S+)*

(一些非空格,可选[一个或多个非空格旁边的单个空格],包括总是要关闭的非空格)

这适用于各种正则表达式测试工具,但我似乎无法使用oXygen XML Editor检查它; 字符串中的双空格,前导和尾随空格,制表符和换行符似乎允许XML实例仍然通过验证。

这是XSD实现:

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:token"> <xs:maxLength value="500"/> <xs:minLength value="1"/> <xs:pattern value="\S+( \S+)*"/> </xs:restriction> </xs:simpleType>

是否有一些功能

XML

要么

XSD

要么

oXygen XML编辑器

这阻止了这个工作?

I want write an XSD to restrict the content of valid XML elements of type xsd:token such that at validation they would indistinguishable from the same content wrapped in xsd:string.

I.e. they do not contain the carriage return (#xD), line feed (#xA) nor tab (#x9) characters, begin or end with a space (#x20) character, and do not include a sequence of two or more adjacent space characters.

I think the regular expression to use is this:

\S+( \S+)*

(some non-whitespace, optional [single spaces next to one or more non-whitespaces], including always non-whitespace to close out)

This works with various regex testing tools but I can't seem to check it using oXygen XML Editor; double spaces, leading and trailing spaces, tabs, and line breaks in the strings seem to allow the XML instance to still pass validation.

Here's the XSD implementation:

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:token"> <xs:maxLength value="500"/> <xs:minLength value="1"/> <xs:pattern value="\S+( \S+)*"/> </xs:restriction> </xs:simpleType>

Is there some feature of

XML

or

XSD

or

oXygen XML Editor

that prevents this working?

最满意答案

您的原始([^\s])+( [^\s]+)*([^\s])*正则表达式包含一些冗余模式:它匹配并捕获1 +非空格的每次迭代,然后匹配0+空间序列和1+非空格,然后再次尝试匹配和捕获非空白的每次迭代。

您可以使用类似但更短的

\S+( \S+)*

由于XML Schema正则表达式默认是锚定的,因此表达式匹配:

\S+ - 除了空格之外的一个或多个字符,特别是&#20; (空格), \t (制表符), \n (换行符)和\r \n (返回) ( \S+)* - 零个或多个空格序列和1+个空格。

此表达式不允许重复的连续空格,并且在前导/尾随位置没有空格。

以下是应该如何使用正则表达式:

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:string"> <xs:pattern value="\S+( \S+)*"/> <xs:maxLength value="500"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType>

Your original ([^\s])+( [^\s]+)*([^\s])* regex contains some redundant patterns: it matches and captures each iteration of 1+ non-whitespaces, then matches 0+ sequences of space and 1+ non-whitespaces, and then again tries to match and capture each iteration of a non-whitespace.

You may use a similar, but shorter

\S+( \S+)*

Since XML Schema regex is anchored by default, there expression matches:

\S+ - one or more chars other than whitespace, specifically &#20; (space), \t (tab), \n (newline) and \r (return) ( \S+)* - zero or more sequences of a space and 1+ whitespaces.

This expression disallows duplicate consecutive spaces and no spaces at leading/trailing position.

Here is how the regex should be used:

<xs:simpleType name="Tokenized500Type"> <xs:restriction base="xs:string"> <xs:pattern value="\S+( \S+)*"/> <xs:maxLength value="500"/> <xs:minLength value="1"/> </xs:restriction> </xs:simpleType>

更多推荐

本文发布于:2023-08-04 11:48:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1413938.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符串   完全相同   正则表达式   string   xsd

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!