Haskell Posix中的多线匹配(Multiline Matching in Haskell Posix)

编程入门行业动态更新时间:2024-10-26 16:22:15

我似乎无法在haskell的POSIX实现上找到合适的文档。特别是模块Text.Regex.Posix 。

有人能指出我在字符串上使用多行匹配的正确方向吗？

一个奇怪的片段：

> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String

我正在尝试提取维基百科页面的来源，但是当涉及多行时，这种方法显然会失败。

I can't seem to find decent documentation on haskell's POSIX implementation. Specifically the module Text.Regex.Posix.

Can anyone point me in the right direction of using multiline matching on a string?

A snippet for the curious:

> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String

I'm trying to extract the source of wikipedia pages, however this method clearly falls over when more than one line is involved.

最满意答案

您可能需要import Text.Regex.Base.RegexLike以访问makeRegexOpts和朋友。

extractToken body = match regex body where regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

好吧，因为Text.Regex.Posix的defaultCompOpt = compExtended + compNewline ，它的效果与

extractToken body = match regex body where regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

要仅提取第一个组，请使用RegexLike的其他实例之一。一种可能性是

extractToken body = head groups where (preMatch, inMatch, postMatch, groups) = match regex body :: (String, String, String, [String]) regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

You may need to import Text.Regex.Base.RegexLike for access to makeRegexOpts and friends.

extractToken body = match regex body where regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

Well, since Text.Regex.Posix's defaultCompOpt = compExtended + compNewline, that works out equivalently as

extractToken body = match regex body where regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

To pull out just the first group, use one of the other instances of RegexLike. One possibility is

更多推荐

本文发布于:2023-07-19 02:41:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1171512.html