本文介绍了通过XPath解析HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
在.Net中,我发现了这个伟大的库, HtmlAgilityPack ,可以让你轻松解析非常好的使用XPath构建的HTML。我在.Net网站上使用过这么多年,但我不得不为我的Python,Ruby和其他项目寻找更痛苦的库。是否有人知道其他语言的类似库?解决方案
在python中, ElementTidy 解析标签汤并生成一个元素树,该元素树允许使用XPath进行查询:
>>> from elementtidy.TidyHTMLTreeBuilder将TidyHTMLTreeBuilder导入为TB >>> tb = TB()>>> tb.feed(< p> Hello world)>>> e = tb.close()>>> e.find(.//{www.w3/1999/xhtml}p)In .Net, I found this great library, HtmlAgilityPack that allows you to easily parse non-well-formed HTML using XPath. I've used this for a couple years in my .Net sites, but I've had to settle for more painful libraries for my Python, Ruby and other projects. Is anyone aware of similar libraries for other languages?
解决方案In python, ElementTidy parses tag soup and produces an element tree, which allows querying using XPath:
>>> from elementtidy.TidyHTMLTreeBuilder import TidyHTMLTreeBuilder as TB >>> tb = TB() >>> tb.feed("<p>Hello world") >>> e= tb.close() >>> e.find(".//{www.w3/1999/xhtml}p") <Element {www.w3/1999/xhtml}p at 264eb8>
更多推荐
通过XPath解析HTML
发布评论