我似乎无法在haskell的POSIX实现上找到合适的文档。 特别是模块Text.Regex.Posix 。
有人能指出我在字符串上使用多行匹配的正确方向吗?
一个奇怪的片段:
> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: String我正在尝试提取维基百科页面的来源,但是当涉及多行时,这种方法显然会失败。
I can't seem to find decent documentation on haskell's POSIX implementation. Specifically the module Text.Regex.Posix.
Can anyone point me in the right direction of using multiline matching on a string?
A snippet for the curious:
> extractToken body = body =~ "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>" :: StringI'm trying to extract the source of wikipedia pages, however this method clearly falls over when more than one line is involved.
最满意答案
您可能需要import Text.Regex.Base.RegexLike以访问makeRegexOpts和朋友。
extractToken body = match regex body where regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"好吧,因为Text.Regex.Posix的defaultCompOpt = compExtended + compNewline ,它的效果与
extractToken body = match regex body where regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"要仅提取第一个组,请使用RegexLike的其他实例之一。 一种可能性是
extractToken body = head groups where (preMatch, inMatch, postMatch, groups) = match regex body :: (String, String, String, [String]) regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"You may need to import Text.Regex.Base.RegexLike for access to makeRegexOpts and friends.
extractToken body = match regex body where regex = makeRegexOpts (defaultCompOpt - compNewline) defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"Well, since Text.Regex.Posix's defaultCompOpt = compExtended + compNewline, that works out equivalently as
extractToken body = match regex body where regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"To pull out just the first group, use one of the other instances of RegexLike. One possibility is
extractToken body = head groups where (preMatch, inMatch, postMatch, groups) = match regex body :: (String, String, String, [String]) regex = makeRegexOpts compExtended defaultExecOpt "<textarea[^>]*id=\"wpTextbox1\"[^>]*>(.*)</textarea>"更多推荐
发布评论