提取当前节点内容(包括所有子节点)时遇到问题.
I've met a problem while extracting current node content including all child node.
就像下面的代码一样,我想获取字符串 abcdefg<b>b1b2b3</b> 在预标签中.
Just like the following code, I want to get string abcdefg<b>b1b2b3</b> in pre tag.
但是我不能使用"child :: *"来获取它. 如果使用"/text()",则会丢失b标签格式信息.请帮帮我.
But I could not use "child::*" to get it. If I use "/text()", I lost b tag format information. Please help me out.
# -*- coding: utf-8 -*- from lxml import html import lxml.etree as le input = "<pre>abcdefg<b>b1b2b3</b></pre>" input_xpath = "//pre/child::*" tree = html.fromstring(input) result = tree.xpath(input_xpath) result1 = [le.tostring(item) for item in result] result2 = ''.join(result1) print result2 output: <b>b1b2b3</b>推荐答案
获取XML节点的内容标记(有时称为"innerXML" ),您可以从选择节点开始(而不是选择子项或文本内容):
To get XML node's content markup (sometimes referred to as "innerXML") , you can start by selecting the node (instead of selecting the child or the text content) :
from lxml import html import lxml.etree as le input = "<pre>abcdefg<b>b1b2b3</b></pre>" tree = html.fromstring(input) node = tree.xpath("//pre")[0]然后将文本内容与所有子节点标记结合起来:
then combine the text content with all child nodes markup :
result = node.text + ''.join(le.tostring(e) for e in node) print result输出:
abcdefg<b>b1b2b3</b>更多推荐
Xpath提取当前节点内容,包括所有子节点
发布评论