如果文本不在特定的HTML标记内，则替换文本(Replace text if it's not inside certain specified HTML tags)

系统教程行业动态更新时间:2024-06-14 16:57:18

我有一个应该在HTML页面上替换的单词列表，但只有当单词不在标签列表内（如ABI）

所以如果有文字：

some text and XXX term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and XXX term another XXX inside other sentance

XXX应该替换为YYY而不是最终文本应该是：

some text and YYY term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and YYY term another XXX inside other sentance

YYY仅在XXX不在限制标签列表（A，I，B）内时才替换XXX

应该以某种方式在C＃regex中完成

非常感谢您的帮助:)

I have a list of words that should be replaced on HTML page, but only if word is not inside a list of tags (like A B I)

So if there is text :

some text and XXX term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and XXX term another XXX inside other sentance

and XXX should be replaced to YYY than final text should be:

some text and YYY term <a href="http://some-XXX-bla.com">good morning XXX world</a> other text and YYY term another XXX inside other sentance

YYY replaced XXX only if XXX was not inside a list of restricted tags (A, I, B)

Should be done somehow in C# regex

Thanks a lot for help :)

最满意答案

你可以使用MatchEvaluator。这个想法是，你匹配你的列表中的一个类型的完整元素或目标字符串。如果您匹配完整的元素，则只需将其重新插入即可 - 您不关心它是否包含目标字符串。否则，您插入替换文本。

public string GetReplacement(Match m) { return m.Groups[1].Success ? m.Groups[1].Value : "YYY"; } Regex r = new Regex( @"(?is)(<([abi]\b)[^<>]*>.*?</\2>)|XXX" ); string newString = r.Replace(oldString, new MatchEvaluator(GetReplacement));

但请注意，在许多情况下，即使在有效的（X）HTML中，此代码也会失败。例如，一个元素可以嵌套在另一个相同类型的元素中，如下所示：

blah blah XXX

或者评论中的开始或结束标签可能会让你感到沮丧：

blah  XXX

您可以通过使正则表达式和MatchEvaluator代码更加复杂来处理许多潜在的问题，但最终您必须接受一些缺陷，或者切换到像Noldorin推荐的那样专用的HTML解析器。

You could use a MatchEvaluator. The idea is that you match either a complete element of one of the types on your list, or the target string. If you match a complete element, you just plug it back in--you don't care if it contains the target string. Otherwise, you insert the replacement text.

But be aware that there are many circumstances where this code would fail, even in valid (X)HTML. For example, an element could be nested inside another element of the same kind, like this:

blah blah XXX

Or a start or end tag inside a comment could trip you up:

blah  XXX

You could handle many of the potential problems by making the regex and the MatchEvaluator code more complicated, but eventually you either have to accept a few flaws, or switch to dedicated HTML parser like the one Noldorin recommended.

更多推荐

本文发布于:2023-04-12 20:18:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/dzcp/2c2fefadad96704afc16cf382657def9.html