如何替换lxml中的元素?

编程入门 行业动态 更新时间:2024-10-27 20:34:45
本文介绍了如何替换lxml中的元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个文本(CRM用户输入的数据)Web服务,该文本返回可怕的格式".我在使用数据之前使用python进行了过滤,但是在删除换行符(br)时,我也删除了文本.代码如下:

I have a text that I get (data entered by users of CRM) web service, which returns a "terrifying format". I am filtering with python before using the data, but when it comes to removing line breaks (br) removed me also the texts. The code is as follows:

description = ''' <div id="highlight" class="section"> <p> text............... </p> <br> <h1>TITLE</h1> <p>Multiple text <br>&nbsp; </p> <ul> <li>bad layer....</li> </ul> <p> <br>subTitle </p> <p>&nbsp;</p> <p style="text-align: center;"> <br>Text1 <br>Text2 <br>Text3 <br>Text4 <br>Text5 <br>Text6 </p> <p style="text-align: center;"> <strong>small title</strong> <br>Text small</p> <p style="text-align: center;"> <strong>highlighted text</strong> <br> <br><strong>Text1</strong> <br>Text2 <br>Text3 <br>Text4 </p> <p style="text-align: center;"> <strong>small text</strong> <br>Text1 <br>Text2 </p> <p style="text-align: center;"> <strong>small text</strong> <br>description </p> <p style="text-align: center;"> <br>&nbsp;</p> <p><strong>description two</strong></p> <p> <br>&nbsp;</p> </div> ''' tree = html.fragment_fromstring( description ) for element in tree.xpath('//br'): #element.getparent().remove(element) print element.text print element.getparent().getchildren() #print element #print element.getparent() #print element.getchildren() #print element.getnext() #print '--------------------------------'

我尝试使用element.getparent().remove(element)删除 br ,但是也删除了文本,我做了测试以查看文本是否属于任何节点,但不是如此.

I have tried to remove the br with element.getparent().remove(element), but also deletes the text, I did tests to see if the texts belong to any node, but not so.

我曾考虑过用li更改br,用ul中的stylo来制作p,但我想不起来,就像这样(前面的la脚):

I've thought about changing the br by li, making the p with stylo in ul, but I can't think as do it, something like this (the previous text lame):

.......... .......... <ul> <li>Text1</li> <li>Text2</li> <li>Text3</li> <li>Text4</li> <li>Text5</li> <li>Text6</li> </ul> <ul> <li><strong>small title</strong></li> <li>Text small</li></ul> <ul> <li><strong>highlighted text</strong></li> <li><strong>Text1</strong></li> <li>Text2</li> <li>Text3</li> <li>Text4</li> </ul> <ul> <li><strong>small text</strong></li> <li>Text1</li> <li>Text2</li> </ul> <ul> <li><strong>small text</strong></li> <li>description</li> </ul> <ul> <li>&nbsp;</li></ul> ........

我不认为是文本,因为我认为仅选择具有样式和其值的节点p的xpath,创建节点li的子级和父级ul,就可以消除p.

I can't think as take texts, because I thought that just choosing the xpath of the node p with style and its value, creating nodes children of li and a parent ul, eliminated p.

可能吗?谢谢

致谢

推荐答案

您可以使用lxml.etree.strip_elements,如下所示:

You can use lxml.etree.strip_elements, like so:

from lxml import html from lxml import etree tree = html.fragment_fromstring( description ) etree.strip_elements(tree, 'br', with_tail=False) print etree.tostring(tree,pretty_print=True)

更多推荐

如何替换lxml中的元素?

本文发布于:2023-11-10 15:39:22,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1575769.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:元素   lxml

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!