下面的字符串是用XPath得到的: // body / div / table / tbody / tr / td / p [5] 但我无法再深入研究这个问题来检索正确的数据,或者我可以吗?
我在举例如下:
< strong> Kontaktdaten des Absenders:< / strong> < br> < strong>名称:< / strong>通缉数据< br> < strong>电话:< / strong> < br>从这个字符串我试图得到通缉的数据
到目前为止,我的正则表达式如下所示:
(?<=< \ / (?)(。*)(?=< br>)< br> <强>名称:其中/强>想要的数据< br> <强>电话:其中/强> < a dir ='ltr'href ='tel:XXXXXXXXX'x-apple-data-detectors ='true'x-apple-data-detectors-type ='telephone'x-apple-data-detector-result =' 3' > XXXXXXXXX< / A>
我以为我可以用重复的组来解决这个问题
$ b $ ()(。*)(?=< br>))+)但是,如果没有重复组,这将返回相同的输出。
我知道我可以围绕这个正则表达式构建一个{}循环来获得相同的输出,但是由于这是我必须做的唯一正则表达式(但意味着我必须为所有其他数据更改它),所以我是想知道是否可以在正则表达式中做到这一点。
感谢您的支持。
解决方案正则表达式是解析标记的错误工具。 您有一个正确的XML解析工具XPath。使用它完成工作:
这个XPath,
strong [ 。='Name:'] / following-sibling :: text()[1]到您原来的XPath中,
// body / div / table / tbody / tr / td / p [5] / strong [ 。='Name:'] / following-sibling :: text()[1]按照要求选择紧跟在< strong>名称:< / strong> 标签之后的文本节点,并且不需要正则表达式来覆盖标记。
Have been searching for the solution to my problem now already for a while and have been playing around regex101 for a while but cannot find a solution.
The problem I am facing is that I have to make a string select for different inputs, thus I wanted to do this with Regular expressions to get the wanted data from these strings. The regular expression will come from a configuration for each string seperately. (since they differ)
The string below is gained with a XPath: //body/div/table/tbody/tr/td/p[5] but I cannot dig any lower into this anymore to retrieve the right data or can I ?
The string I am using at the moment as example is the following:
<strong>Kontaktdaten des Absenders:</strong> <br> <strong>Name:</strong> Wanted data <br> <strong>Telefon:</strong> <a dir='ltr' href='tel:XXXXXXXXX' x-apple-data-detectors='true' x-apple-data-detectors-type='telephone' x-apple-data-detectors-result='3'>XXXXXXXXX</a> <br>From this string I am trying to get the "Wanted data"
My regular expression so far is the following:
(?<=<\/strong> )(.*)(?= <br>)But this returns the whole:
<br> <strong>Name:</strong> Wanted data <br> <strong>Telefon:</strong> <a dir='ltr' href='tel:XXXXXXXXX' x-apple-data-detectors='true' x-apple-data-detectors-type='telephone' x-apple-data-detectors-result='3'>XXXXXXXXX</a>I thought I could solve this with a repeat group
((:?(?<=<\/strong> )(.*)(?= <br>))+)But this returns the same output as without the repeat group.
I know I could build a for { } loop around this regex to gain the same output, but since this is the only regular expression I have to do this for (but means I have to change it for all the other data) I was wondering if it is possible to do this in a regular expression.
Thank you for the support already so far.
解决方案Regex is the wrong tool for parsing markup. You have a proper XML parsing tool, XPath, in hand. Finish the job with it:
This XPath,
strong[.='Name:']/following-sibling::text()[1]when appended to your original XPath,
//body/div/table/tbody/tr/td/p[5]/strong[.='Name:']/following-sibling::text()[1]will finish the job of selecting the text node immediately following the <strong>Name:</strong> label, as requested, with no regex hacks over markup required.
更多推荐
正则表达式针对XPath之后的标记?
发布评论