我是 Perl 和 regex 的新手,我需要从文本文件中提取所有字符串.字符串由双引号包裹的任何内容标识.
I am new to Perl and regex and I need to extract all the strings from a text file. A string is identified by anything that is wrapped by double quotes.
字符串示例:
"This is string" "1!=2" "This is \"string\"" "string1"."string2" "S t r i n g"代码:
my $fh; open($fh,'<','text.txt') or die "$!"; undef $/; my $text = <$fh>; my @strings = m/".*"/g; # this returns the most out "" in example 4 my @strings2 = m/"[^"]*"/g #fixed the above issue but does not take in example 3我想得到(1)一个双引号,然后是(2)零次或多次出现的非双引号非反斜杠或反斜杠后跟任何字符,然后是(3)双引号.(2) 可以是
Edited : I want to get (1) a double quote, followed by (2) zero or more occurrences of either a non-double-quote-non-backslash or a backslash followed by any character, followed by (3) a double quote. (2) can be anything but "
下面提供的正则表达式 m/"(?:\.|[^"])*"/g 但是当有一行带有 "string1".string2."string2" 时将返回 "string1" string2 "string3"
The regex provided below m/"(?:\.|[^"])*"/g however when the there is a line with "string1".string2."string2" it will return "string1" string2 "string3"
有什么地方可以跳过之前匹配的单词吗?
Is there any wher to skip the previously matched word?
有人可以帮忙吗?
推荐答案一种可能的方法:
/"(?:\\.|[^"])*"/... 读作:
- 匹配双引号,
后跟任意数量的...
- match double quotation mark,
followed by any number of...
--- 任何转义字符(任何以 \ 开头的符号)
--- either any escaped character (any symbol prepended by \)
--- 或任何不是双引号的字符
--- or any character that's not a double quotation mark
这里的关键技巧是使用交替使用任何转义符号 - 包括转义双引号.
The key trick here is using alternation that'll eat any escaped symbol - including escaped double quotation mark.
演示.
更多推荐
perl 正则表达式匹配“"(字符串)语法
发布评论