XPATH(1.0)查询提取文本直到特定标记(XPATH (1.0) query for fetching text until specific tag)

编程入门 行业动态 更新时间:2024-10-28 20:27:04
XPATH(1.0)查询提取文本直到特定标记(XPATH (1.0) query for fetching text until specific tag)

我正在构建一个内部读者风格的PHP应用程序,它从我们的页面中提取文本,然后对其进行各种操作。 大多数HTML页面上的文本都是无序的,因此应用程序必须能够在不使用类名或其他导航锚的情况下获取文本,因为没有。 只有文本标题与锚点相关。

我想从给定的起始节点(标题)中获取文本,然后在我到达img标签时停止(可能存在或不存在,如果没有,那么这将意味着应该获取所有文本)。 我目前只成功使用XPath获取没有图像的文本。

这是一个HTML示例

<b>Some title</b> <br/> Important text <br/> More important text <p> More text I which should be fetched</p> <p><img src="foo.jpg"/></p> <p> Unimportant text, don't want it!</p>

这是我正在使用的XPath查询//*[text()="Some title"]/following::text() 。

以上确实提取了相关文本,但是如果它存在,我希望它停止在img标记处。 知道怎么做吗?

I'm building an in-house reader-style PHP app which fetches text from our pages and then does various manipulations on it. The text on most of our HTML pages is unordered so the app has to be able to grab text without using class names or other navigation anchors since there are none. Only the text title is relevant as an anchor.

I would like to fetch text from a given start node (the title) and then stop when I reach an img tag (which may or may not exist, if not then this would mean that all the text should be fetched). I've currently succeeded only in fetching the text without the image using XPath.

Here's a sample HTML

<b>Some title</b> <br/> Important text <br/> More important text <p> More text I which should be fetched</p> <p><img src="foo.jpg"/></p> <p> Unimportant text, don't want it!</p>

This is the XPath query I'm currently using //*[text()="Some title"]/following::text().

The above indeed fetches the relevant text, however I would like it to stop at the img tag if it exists. Any idea how to do this?

最满意答案

获取不在图像之前的所有文本节点。

//*[text()="Some title"]/following::text()[not(preceding::img)]

如果需要,您可以轻松地进一步限制停止的图像。

Fetch all text nodes that are not preceded by an image.

//*[text()="Some title"]/following::text()[not(preceding::img)]

You can easily further restrict which images to stop at if needed.

更多推荐

本文发布于:2023-07-23 04:43:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1227588.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标记   文本   XPATH   query   specific

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!