Haskell Posix中的多线匹配(Multiline Matching in Haskell Posix)

编程入门 行业动态 更新时间:2024-10-26 16:22:15
Haskell Posix中的多线匹配(Multiline Matching in Haskell Posix)

我似乎无法在haskell的POSIX实现上找到合适的文档。 特别是模块Text.Regex.Posix 。

有人能指出我在字符串上使用多行匹配的正确方向吗?

一个奇怪的片段:

> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String

我正在尝试提取维基百科页面的来源,但是当涉及多行时,这种方法显然会失败。

I can't seem to find decent documentation on haskell's POSIX implementation. Specifically the module Text.Regex.Posix.

Can anyone point me in the right direction of using multiline matching on a string?

A snippet for the curious:

> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String

I'm trying to extract the source of wikipedia pages, however this method clearly falls over when more than one line is involved.

最满意答案

您可能需要import Text.Regex.Base.RegexLike以访问makeRegexOpts和朋友。

extractToken body = match regex body where regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

好吧,因为Text.Regex.Posix的defaultCompOpt = compExtended + compNewline ,它的效果与

extractToken body = match regex body where regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

要仅提取第一个组,请使用RegexLike的其他实例之一。 一种可能性是

extractToken body = head groups where (preMatch, inMatch, postMatch, groups) = match regex body :: (String, String, String, [String]) regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

You may need to import Text.Regex.Base.RegexLike for access to makeRegexOpts and friends.

extractToken body = match regex body where regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

Well, since Text.Regex.Posix's defaultCompOpt = compExtended + compNewline, that works out equivalently as

extractToken body = match regex body where regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

To pull out just the first group, use one of the other instances of RegexLike. One possibility is

extractToken body = head groups where (preMatch, inMatch, postMatch, groups) = match regex body :: (String, String, String, [String]) regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"

更多推荐

本文发布于:2023-07-19 02:41:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1171512.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:Posix   Haskell   Matching   Multiline

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!