在上一篇文章中,我曾要求在没有否定的情况下重写正则表达式
开始正则表达式:
https?:\/\/(?:.(?!https?:\/\/))+$结束于:
https?:[^:]*$这工作正常,但我注意到,如果我将有:在我的URL除了:从http \ s它不会选择。
这是一个不起作用的字符串:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2您可以注意到:query2
如何修改此处列出的第二个正则表达式,以便选择包含: URL :
预期产量:
http://websites.com/path/subpath/cc:query2此外,我想选择一切,直到第一次出现?=param
输入: sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param
输出:
http://websites.com/path/subpath/cc:query2/text/
In a previous post I've asked for some help on rewriting a regex without negation
Starting regex:
https?:\/\/(?:.(?!https?:\/\/))+$Ended up with:
https?:[^:]*$This works fine but i've noticed that in case I will have : in my URL besides the : from http\s it will not select.
Here is a string which is not working:
sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2You can notice the :query2
How can I modify the second regex listed here so it will select urls which contain :.
Expected output:
http://websites.com/path/subpath/cc:query2Also I would like to select everything till the first occurance of ?=param
Input: sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param
Output:
http://websites.com/path/subpath/cc:query2/text/
最满意答案
很遗憾,Go正则表达式不支持外观。 但是,您可以通过一种技巧获取最后一个链接:贪婪地匹配所有可能的链接和其他字符,并捕获与捕获组的最后一个链接:
^(?:https?://|.)*(https?://\S+?)(?:\?=|$)和\S*?一起\S*? 懒惰的空白匹配,这也让我们捕获链接到?= 。
请参阅regex演示和Go演示
var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`) fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1]) fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])结果:
"http://websites.com/path/subpath/:query2" "http://websites.com/path/subpath/cc:query2/text/"如果最后一个链接中可以有空格,请使用.+? :
^(?:https?://|.)*(https?://.+?)(?:\?=|$)It is a pity that Go regex does not support lookarounds. However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:
^(?:https?://|.)*(https?://\S+?)(?:\?=|$)Together with \S*? lazy whitespace matching, this also lets capture the link up to the ?=.
See regex demo and Go demo
var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`) fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1]) fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])Results:
"http://websites.com/path/subpath/:query2" "http://websites.com/path/subpath/cc:query2/text/"In case there can be spaces in the last link, use just .+?:
^(?:https?://|.)*(https?://.+?)(?:\?=|$)更多推荐
发布评论