如何删除超过一些出现次数的重复br标签?(How to remove duplicate br tag that exceeds a number of occurences?)

编程入门 行业动态 更新时间:2024-10-28 16:19:40
如何删除超过一些出现次数的重复br标签?(How to remove duplicate br tag that exceeds a number of occurences?)

我想为每个段落保留不超过2个

string html = @"paragraph 1 a dkahdk ahkdhadk.<br><br><br> <br> paragraph 2 adshkad hkasdhkasdh.<br> <br> paragraph 3 akdash dkjahiewry iwery.<br> <br><br> paragraph 4 ljsdlfjsldfj.<br> <br> <br> <br>"; HtmlAgilityPack.HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); var xpath = "//text()[not(normalize-space())]"; var emptyNodes = doc.DocumentNode.SelectNodes(xpath); foreach (HtmlNode emptyNode in emptyNodes) { emptyNode.Remove(); // remove \r\n } var nodes = doc.DocumentNode.SelectNodes("//br[following-sibling::br[3]]").ToList(); foreach (var node in nodes) { node.Remove(); }

输出以某种方式删除所有br 。 正确的输出应该是

paragraph 1 a dkahdk ahkdhadk.<br><br> paragraph 2 adshkad hkasdhkasdh.<br><br> paragraph 3 akdash dkjahiewry iwery.<br><br> paragraph 4 ljsdlfjsldfj.<br><br>

I want to keep no more than 2 <br> for each paragraph

string html = @"paragraph 1 a dkahdk ahkdhadk.<br><br><br> <br> paragraph 2 adshkad hkasdhkasdh.<br> <br> paragraph 3 akdash dkjahiewry iwery.<br> <br><br> paragraph 4 ljsdlfjsldfj.<br> <br> <br> <br>"; HtmlAgilityPack.HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); var xpath = "//text()[not(normalize-space())]"; var emptyNodes = doc.DocumentNode.SelectNodes(xpath); foreach (HtmlNode emptyNode in emptyNodes) { emptyNode.Remove(); // remove \r\n } var nodes = doc.DocumentNode.SelectNodes("//br[following-sibling::br[3]]").ToList(); foreach (var node in nodes) { node.Remove(); }

The output is somehow it removes all br. The right output should be

paragraph 1 a dkahdk ahkdhadk.<br><br> paragraph 2 adshkad hkasdhkasdh.<br><br> paragraph 3 akdash dkjahiewry iwery.<br><br> paragraph 4 ljsdlfjsldfj.<br><br>

最满意答案

与使用HtmlAgilityPack相比,简单的正则表达式替换就足够了。 例如,使用多步骤过程:

//use regex to find <br>, <br > or <br /> tags: //var toNewLines = new Regex( @"<br\s?/?>" ); //var onlyNewLines = toNewLines.Replace(html, Environment.NewLine); //or, since all br tags are <br>: var onlyNewLines = html.Replace("<br>", Environment.NewLine); var regex = new Regex( @"([" + Environment.NewLine + "\t])+" ); var result = regex.Replace(onlyNewLines, Environment.NewLine); var finalResult = result.Replace(Environment.NewLine, "<br /><br />" + Environment.NewLine);

A simple regex replace would suffice, as opposed to using HtmlAgilityPack. For example, using a multi-step process:

//use regex to find <br>, <br > or <br /> tags: //var toNewLines = new Regex( @"<br\s?/?>" ); //var onlyNewLines = toNewLines.Replace(html, Environment.NewLine); //or, since all br tags are <br>: var onlyNewLines = html.Replace("<br>", Environment.NewLine); var regex = new Regex( @"([" + Environment.NewLine + "\t])+" ); var result = regex.Replace(onlyNewLines, Environment.NewLine); var finalResult = result.Replace(Environment.NewLine, "<br /><br />" + Environment.NewLine);

更多推荐

本文发布于:2023-08-07 20:04:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1465685.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:次数   标签   remove   duplicate   br

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!