正则表达式针对XPath之后的标记?

编程入门 行业动态 更新时间:2024-10-23 19:33:07
本文介绍了正则表达式针对XPath之后的标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我面临的问题是我必须为不同的输入做一个字符串选择,因此我想用正则表达式来做这些,以从这些字符串中获取所需的数据。 正则表达式将分别来自每个字符串的配置。 (因为它们不同)

下面的字符串是用XPath得到的: // body / div / table / tbody / tr / td / p [5] 但我无法再深入研究这个问题来检索正确的数据,或者我可以吗?

我在举例如下:

< strong> Kontaktdaten des Absenders:< / strong> < br> < strong>名称:< / strong>通缉数据< br> < strong>电话:< / strong> < br>

从这个字符串我试图得到通缉的数据

到目前为止,我的正则表达式如下所示:

(?<=< \ / (?)(。*)(?=< br>)

< br> <强>名称:其中/强>想要的数据< br> <强>电话:其中/强> < a dir ='ltr'href ='tel:XXXXXXXXX'x-apple-data-detectors ='true'x-apple-data-detectors-type ='telephone'x-apple-data-detector-result =' 3' > XXXXXXXXX< / A>

我以为我可以用重复的组来解决这个问题

$ b $ ()(。*)(?=< br>))+)

但是,如果没有重复组,这将返回相同的输出。

我知道我可以围绕这个正则表达式构建一个{}循环来获得相同的输出,但是由于这是我必须做的唯一正则表达式(但意味着我必须为所有其他数据更改它),所以我是想知道是否可以在正则表达式中做到这一点。

感谢您的支持。

解决方案

正则表达式是解析标记的错误工具。 您有一个正确的XML解析工具XPath。使用它完成工作:

这个XPath,

strong [ 。='Name:'] / following-sibling :: text()[1]

到您原来的XPath中,

// body / div / table / tbody / tr / td / p [5] / strong [ 。='Name:'] / following-sibling :: text()[1]

按照要求选择紧跟在< strong>名称:< / strong> 标签之后的文本节点,并且不需要正则表达式来覆盖标记。

Have been searching for the solution to my problem now already for a while and have been playing around regex101 for a while but cannot find a solution.

The problem I am facing is that I have to make a string select for different inputs, thus I wanted to do this with Regular expressions to get the wanted data from these strings. The regular expression will come from a configuration for each string seperately. (since they differ)

The string below is gained with a XPath: //body/div/table/tbody/tr/td/p[5] but I cannot dig any lower into this anymore to retrieve the right data or can I ?

The string I am using at the moment as example is the following:

<strong>Kontaktdaten des Absenders:</strong> <br> <strong>Name:</strong> Wanted data <br> <strong>Telefon:</strong> <a dir='ltr' href='tel:XXXXXXXXX' x-apple-data-detectors='true' x-apple-data-detectors-type='telephone' x-apple-data-detectors-result='3'>XXXXXXXXX</a> <br>

From this string I am trying to get the "Wanted data"

My regular expression so far is the following:

(?<=<\/strong> )(.*)(?= <br>)

But this returns the whole:

<br> <strong>Name:</strong> Wanted data <br> <strong>Telefon:</strong> <a dir='ltr' href='tel:XXXXXXXXX' x-apple-data-detectors='true' x-apple-data-detectors-type='telephone' x-apple-data-detectors-result='3'>XXXXXXXXX</a>

I thought I could solve this with a repeat group

((:?(?<=<\/strong> )(.*)(?= <br>))+)

But this returns the same output as without the repeat group.

I know I could build a for { } loop around this regex to gain the same output, but since this is the only regular expression I have to do this for (but means I have to change it for all the other data) I was wondering if it is possible to do this in a regular expression.

Thank you for the support already so far.

解决方案

Regex is the wrong tool for parsing markup. You have a proper XML parsing tool, XPath, in hand. Finish the job with it:

This XPath,

strong[.='Name:']/following-sibling::text()[1]

when appended to your original XPath,

//body/div/table/tbody/tr/td/p[5]/strong[.='Name:']/following-sibling::text()[1]

will finish the job of selecting the text node immediately following the <strong>Name:</strong> label, as requested, with no regex hacks over markup required.

更多推荐

正则表达式针对XPath之后的标记?

本文发布于:2023-11-29 23:18:17,感谢您对本站的认可!
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标记   正则表达式   XPath

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!