这个正则表达式:
<IMG\s([^"'>]+|'[^']*'|"[^"]*")+>在给出这篇文章时似乎处理不休
<img src=http://www.blahblahblah.com/houses/Images/ single_and_multi/roof/feb09/01_img_trrnjks_vol2009.jpg' />我期望它 - 没有找到匹配(很快) - 因为文本中只有一个单引号。 我在C#中发生过这种情况,并且还使用了Expresso正则表达式工具。 如果文本短得多,它似乎工作。
This regular expression:
<IMG\s([^"'>]+|'[^']*'|"[^"]*")+>seems to process endlessly when given this text
<img src=http://www.blahblahblah.com/houses/Images/ single_and_multi/roof/feb09/01_img_trrnjks_vol2009.jpg' />I would expect it to - not find a match (quickly) - because there is only one single quote in the text. I have had this happen in C# and also using the Expresso regex tool. If the text is a lot shorter it seems to work.
最满意答案
其他评论者提到复杂性是导致perfo问题的可能原因。 我想补充一点,如果你想匹配类似于IMG标签的东西,我想你想要一个正则表达式更像这样:
<IMG(\s+[a-z]+=('[^']*'|"[^"]*"|[^\s'">]+))+>当然,这个正则表达式不会捕捉到仍然有效的HTML变体。 就像关闭/ (在xhtml中要求)或者右括号之前的空格。 它会通过一些无效的情况,如不支持的属性名称。
Other commenters have mentioned the complexity being the likely cause for the perfo problem. I'd add that if you're trying to match something resembling an IMG tag, I think you want a regex more like this:
<IMG(\s+[a-z]+=('[^']*'|"[^"]*"|[^\s'">]+))+>Of course, there are still valid HTML variations that this regex won't catch. Like a closing / (required in xhtml), or whitespace before the closing bracket. And it will pass some invalid cases, like unsupported attribute names.
更多推荐
发布评论