如果文本不在特定的HTML标记内,则替换文本(Replace text if it's not inside certain specified HTML tags)

系统教程 行业动态 更新时间:2024-06-14 16:57:18
如果文本不在特定的HTML标记内,则替换文本(Replace text if it's not inside certain specified HTML tags)

我有一个应该在HTML页面上替换的单词列表,但只有当单词不在标签列表内(如ABI)

所以如果有文字:

<p> some text and XXX term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and XXX term <b>another XXX inside other sentance</b> </p>

XXX应该替换为YYY而不是最终文本应该是:

<p> some text and YYY term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and YYY term <b>another XXX inside other sentance</b> </p>

YYY仅在XXX不在限制标签列表(A,I,B)内时才替换XXX

应该以某种方式在C#regex中完成

非常感谢您的帮助:)

I have a list of words that should be replaced on HTML page, but only if word is not inside a list of tags (like A B I)

So if there is text :

<p> some text and XXX term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and XXX term <b>another XXX inside other sentance</b> </p>

and XXX should be replaced to YYY than final text should be:

<p> some text and YYY term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and YYY term <b>another XXX inside other sentance</b> </p>

YYY replaced XXX only if XXX was not inside a list of restricted tags (A, I, B)

Should be done somehow in C# regex

Thanks a lot for help :)

最满意答案

你可以使用MatchEvaluator。 这个想法是,你匹配你的列表中的一个类型的完整元素目标字符串。 如果您匹配完整的元素,则只需将其重新插入即可 - 您不关心它是否包含目标字符串。 否则,您插入替换文本。

public string GetReplacement(Match m) { return m.Groups[1].Success ? m.Groups[1].Value : "YYY"; } Regex r = new Regex( @"(?is)(<([abi]\b)[^<>]*>.*?</\2>)|XXX" ); string newString = r.Replace(oldString, new MatchEvaluator(GetReplacement));

但请注意,在许多情况下,即使在有效的(X)HTML中,此代码也会失败。 例如,一个元素可以嵌套在另一个相同类型的元素中,如下所示:

<i>blah <i>blah</i> XXX</i>

或者评论中的开始或结束标签可能会让你感到沮丧:

<b>blah <!-- </b> --> XXX</b>

您可以通过使正则表达式和MatchEvaluator代码更加复杂来处理许多潜在的问题,但最终您必须接受一些缺陷,或者切换到像Noldorin推荐的那样专用的HTML解析器。

You could use a MatchEvaluator. The idea is that you match either a complete element of one of the types on your list, or the target string. If you match a complete element, you just plug it back in--you don't care if it contains the target string. Otherwise, you insert the replacement text.

public string GetReplacement(Match m) { return m.Groups[1].Success ? m.Groups[1].Value : "YYY"; } Regex r = new Regex( @"(?is)(<([abi]\b)[^<>]*>.*?</\2>)|XXX" ); string newString = r.Replace(oldString, new MatchEvaluator(GetReplacement));

But be aware that there are many circumstances where this code would fail, even in valid (X)HTML. For example, an element could be nested inside another element of the same kind, like this:

<i>blah <i>blah</i> XXX</i>

Or a start or end tag inside a comment could trip you up:

<b>blah <!-- </b> --> XXX</b>

You could handle many of the potential problems by making the regex and the MatchEvaluator code more complicated, but eventually you either have to accept a few flaws, or switch to dedicated HTML parser like the one Noldorin recommended.

更多推荐

本文发布于:2023-04-12 20:18:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/2c2fefadad96704afc16cf382657def9.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文本   标记   HTML   tags   Replace

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!