写正则表达式没有否定(Write regex without negations)

编程入门行业动态更新时间:2024-10-23 02:47:32

在上一篇文章中，我曾要求在没有否定的情况下重写正则表达式

开始正则表达式：

https?:\/\/(?:.(?!https?:\/\/))+$

结束于：

https?:[^:]*$

这工作正常，但我注意到，如果我将有:在我的URL除了:从http \ s它不会选择。

这是一个不起作用的字符串：

sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2

您可以注意到:query2

如何修改此处列出的第二个正则表达式，以便选择包含: URL :

预期产量：

http://websites.com/path/subpath/cc:query2

此外，我想选择一切，直到第一次出现?=param

输入： sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param

输出：

http://websites.com/path/subpath/cc:query2/text/

In a previous post I've asked for some help on rewriting a regex without negation

Starting regex:

https?:\/\/(?:.(?!https?:\/\/))+$

Ended up with:

https?:[^:]*$

This works fine but i've noticed that in case I will have : in my URL besides the : from http\s it will not select.

Here is a string which is not working:

sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2

You can notice the :query2

How can I modify the second regex listed here so it will select urls which contain :.

Expected output:

http://websites.com/path/subpath/cc:query2

Also I would like to select everything till the first occurance of ?=param

Input: sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param

Output:

http://websites.com/path/subpath/cc:query2/text/

最满意答案

很遗憾，Go正则表达式不支持外观。但是，您可以通过一种技巧获取最后一个链接：贪婪地匹配所有可能的链接和其他字符，并捕获与捕获组的最后一个链接：

^(?:https?://|.)*(https?://\S+?)(?:\?=|$)

和\S*?一起\S*? 懒惰的空白匹配，这也让我们捕获链接到?= 。

请参阅regex演示和Go演示

var r = regexp.MustCompile(`^(?:https?://|.)*(https?://\S+?)(?:\?=|$)`) fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/:query2", -1)[0][1]) fmt.Printf("%q\n", r.FindAllStringSubmatch("sometextsometexhttp://websites.com/path/subpath/#query1sometexthttp://websites.com/path/subpath/cc:query2/text/?=param", -1)[0][1])

结果：

"http://websites.com/path/subpath/:query2" "http://websites.com/path/subpath/cc:query2/text/"

如果最后一个链接中可以有空格，请使用.+? ：

^(?:https?://|.)*(https?://.+?)(?:\?=|$)

It is a pity that Go regex does not support lookarounds. However, you can obtain the last link with a sort of a trick: match all possible links and other characters greedily and capture the last link with a capturing group:

^(?:https?://|.)*(https?://\S+?)(?:\?=|$)

Together with \S*? lazy whitespace matching, this also lets capture the link up to the ?=.

See regex demo and Go demo

Results:

"http://websites.com/path/subpath/:query2" "http://websites.com/path/subpath/cc:query2/text/"

In case there can be spaces in the last link, use just .+?:

^(?:https?://|.)*(https?://.+?)(?:\?=|$)

更多推荐

本文发布于:2023-07-04 11:05:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1023585.html