我有一个应该在HTML页面上替换的单词列表,但只有当单词不在标签列表内(如ABI)
所以如果有文字:
<p> some text and XXX term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and XXX term <b>another XXX inside other sentance</b> </p>XXX应该替换为YYY而不是最终文本应该是:
<p> some text and YYY term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and YYY term <b>another XXX inside other sentance</b> </p>YYY仅在XXX不在限制标签列表(A,I,B)内时才替换XXX
应该以某种方式在C#regex中完成
非常感谢您的帮助:)
I have a list of words that should be replaced on HTML page, but only if word is not inside a list of tags (like A B I)
So if there is text :
<p> some text and XXX term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and XXX term <b>another XXX inside other sentance</b> </p>and XXX should be replaced to YYY than final text should be:
<p> some text and YYY term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and YYY term <b>another XXX inside other sentance</b> </p>YYY replaced XXX only if XXX was not inside a list of restricted tags (A, I, B)
Should be done somehow in C# regex
Thanks a lot for help :)
最满意答案
你可以使用MatchEvaluator。 这个想法是,你匹配你的列表中的一个类型的完整元素或目标字符串。 如果您匹配完整的元素,则只需将其重新插入即可 - 您不关心它是否包含目标字符串。 否则,您插入替换文本。
public string GetReplacement(Match m) { return m.Groups[1].Success ? m.Groups[1].Value : "YYY"; } Regex r = new Regex( @"(?is)(<([abi]\b)[^<>]*>.*?</\2>)|XXX" ); string newString = r.Replace(oldString, new MatchEvaluator(GetReplacement));但请注意,在许多情况下,即使在有效的(X)HTML中,此代码也会失败。 例如,一个元素可以嵌套在另一个相同类型的元素中,如下所示:
<i>blah <i>blah</i> XXX</i>或者评论中的开始或结束标签可能会让你感到沮丧:
<b>blah <!-- </b> --> XXX</b>您可以通过使正则表达式和MatchEvaluator代码更加复杂来处理许多潜在的问题,但最终您必须接受一些缺陷,或者切换到像Noldorin推荐的那样专用的HTML解析器。
You could use a MatchEvaluator. The idea is that you match either a complete element of one of the types on your list, or the target string. If you match a complete element, you just plug it back in--you don't care if it contains the target string. Otherwise, you insert the replacement text.
public string GetReplacement(Match m) { return m.Groups[1].Success ? m.Groups[1].Value : "YYY"; } Regex r = new Regex( @"(?is)(<([abi]\b)[^<>]*>.*?</\2>)|XXX" ); string newString = r.Replace(oldString, new MatchEvaluator(GetReplacement));But be aware that there are many circumstances where this code would fail, even in valid (X)HTML. For example, an element could be nested inside another element of the same kind, like this:
<i>blah <i>blah</i> XXX</i>Or a start or end tag inside a comment could trip you up:
<b>blah <!-- </b> --> XXX</b>You could handle many of the potential problems by making the regex and the MatchEvaluator code more complicated, but eventually you either have to accept a few flaws, or switch to dedicated HTML parser like the one Noldorin recommended.
更多推荐
发布评论