使用ANTLR4识别单行内的多行注释(Recognize multiple line comments within a single line with ANTLR4)

编程入门 行业动态 更新时间:2024-10-27 20:38:45
使用ANTLR4识别单行内的多行注释(Recognize multiple line comments within a single line with ANTLR4)

我想用ANTLR4解析PostScript代码。 我完成了语法,但是一个特定的语言扩展(由其他人引入)使得重新设置麻烦。

一个简短的例子:

1: % This is a line comment 2: % The next line just pushes the value 10 onto the stack 3: 10 4: 5: %?description This is the special line-comment in question 6: /procedure { 7: /var1 30 def %This just creates a variable 8: /var2 10 def %?description A description associated with var2 %?default 20 9: /var3 (a string value) def %?description I am even allowed to use % signs %?default (another value) 10: }

可以使用Lexer-Rules识别行注释,例如第1行,第2行和第7行

LINE_COMMENT: '%' .*? NEWLINE; NEWLINE: '\r'? '\n';

它简单地匹配%之后的所有内容直到行结束。

我遇到的问题是那些特殊的行注释,从%?description或%?default ,因为那些也应该被识别,但与LINE_COMMENT相反,可以将多个这些注释放在一行中(例如在第8和9行)。 因此第8行包含两个特殊注释%?description A description associated with var2和%?default 20 %?description A description associated with var2 %?default 20 。

把它想象成这样的东西(尽管这不起作用):

SPECIAL_COMMENT: '%?' .*? (SPECIAL_COMMENT|NEWLINE);

现在出现了一个非常棘手的部分:你应该被允许在%?description之后添加任意文本,包括%同时仍然可以拆分单个注释。

简而言之,问题可以简化为分割表格的一行

(%?<keyword> <content with % allowed in it>)+ NEWLINE

例如

%?description descr. with % in in %?default (my default value for 100%) %?rest more

1.) %?description descr. with % in in 2.) %?default (my default value for 100%) 3.) %?rest more

任何想法,如何制定Lexer或Parser规则来实现这一目标?

I want to parse PostScript code with ANTLR4. I finished with the grammar, but one particular language extension (which was introduced by someone else) makes trouble being reconized.

A short example:

1: % This is a line comment 2: % The next line just pushes the value 10 onto the stack 3: 10 4: 5: %?description This is the special line-comment in question 6: /procedure { 7: /var1 30 def %This just creates a variable 8: /var2 10 def %?description A description associated with var2 %?default 20 9: /var3 (a string value) def %?description I am even allowed to use % signs %?default (another value) 10: }

Recognizing line-comments, such as in line 1, 2 and 7 can be done with the Lexer-Rules

LINE_COMMENT: '%' .*? NEWLINE; NEWLINE: '\r'? '\n';

which simply match everything after a % until the end of the line.

The problem I have is with those special line-comments, that start with something like %?description or %?default, because those should be recognized as well, but in contrast to LINE_COMMENT, one can put multiple of those in a single line (such as in lines 8 and 9). So line 8 contains two special comments %?description A description associated with var2 and %?default 20.

Think of it as something like this (although this won't work):

SPECIAL_COMMENT: '%?' .*? (SPECIAL_COMMENT|NEWLINE);

Now comes the really tricky part: You should be allowed to put arbitrary text after %?description including % while still being able to split the individual comments.

So in short, the issue can be reduced to splitting a line of the form

(%?<keyword> <content with % allowed in it>)+ NEWLINE

e.g.

%?description descr. with % in in %?default (my default value for 100%) %?rest more

into

1.) %?description descr. with % in in 2.) %?default (my default value for 100%) 3.) %?rest more

Any ideas, how to formulate Lexer or Parser-rules to achieve this?

最满意答案

鉴于这些规则,我认为你必须在词法分析器中使用谓词检查输入流是否出现%? 。 您还必须确保正常评论必须以%开头,但不是后跟? (或换行符)。

鉴于语法:

grammar T; @lexer::members { boolean ahead(String text) { for (int i = 0; i < text.length(); i++) { if (text.charAt(i) != _input.LA(i + 1)) { return false; } } return true; } } parse : token* EOF ; token : t=SPECIAL_COMMENT {System.out.println("special : " + $t.getText());} | t=COMMENT {System.out.println("normal : " + $t.getText());} ; SPECIAL_COMMENT : '%?' ( {!ahead("%?")}? ~[\r\n] )* ; COMMENT : '%' ( ~[?\r\n] ~[\r\n]* )? ; SPACES : [ \t\r\n]+ -> skip ;

可以按如下方式测试:

String source = "% normal comment\n" +
    "%?description I am even allowed to use % signs %?default (another value)\n" +
    "% another normal comment (without a line break!)";
TLexer lexer = new TLexer(new ANTLRInputStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
 

并将打印以下内容:

normal  : % normal comment
special : %?description I am even allowed to use % signs 
special : %?default (another value)
normal  : % another normal comment (without a line break!)
 

部分( {!ahead("%?")}? ~[\r\n] )*可以理解如下: 如果没有“%?” 在前面,匹配除\r和\n之外的任何字符,并执行此操作零次或多次

Given those rules, I think you'll have to use a predicate in the lexer to check the input stream for occurrences of %?. You'll also have to make sure a normal comment must start with a %, but not followed by a ? (or line break char).

Given the grammar:

grammar T; @lexer::members { boolean ahead(String text) { for (int i = 0; i < text.length(); i++) { if (text.charAt(i) != _input.LA(i + 1)) { return false; } } return true; } } parse : token* EOF ; token : t=SPECIAL_COMMENT {System.out.println("special : " + $t.getText());} | t=COMMENT {System.out.println("normal : " + $t.getText());} ; SPECIAL_COMMENT : '%?' ( {!ahead("%?")}? ~[\r\n] )* ; COMMENT : '%' ( ~[?\r\n] ~[\r\n]* )? ; SPACES : [ \t\r\n]+ -> skip ;

which can be tested as follows:

String source = "% normal comment\n" +
    "%?description I am even allowed to use % signs %?default (another value)\n" +
    "% another normal comment (without a line break!)";
TLexer lexer = new TLexer(new ANTLRInputStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
parser.parse();
 

and will print the following:

normal  : % normal comment
special : %?description I am even allowed to use % signs 
special : %?default (another value)
normal  : % another normal comment (without a line break!)
 

The part ( {!ahead("%?")}? ~[\r\n] )* can be read as follows: if there's no "%?" ahead, match any char other than \r and \n, and do this zero or more times.

更多推荐

本文发布于:2023-08-08 00:37:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1466783.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:注释   Recognize   multiple   single   comments

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!