Antlr规则优先级

编程入门 行业动态 更新时间:2024-10-12 03:26:28
本文介绍了Antlr规则优先级的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

首先,我知道这种语法没有意义,但是它是为测试ANTLR规则优先级行为而创建的

Firstly I know this grammar doesn't make sense but it was created to test out the ANTLR rule priority behaviour

grammar test; options { output=AST; backtrack=true; memoize=true; } rule_list_in_order : ( first_rule | second_rule | any_left_over_tokens)+ ; first_rule : FIRST_TOKEN ; second_rule: FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE; any_left_over_tokens : NEW_LINE | FIRST_TOKEN | SECOND_TOKEN; FIRST_TOKEN : 'First token here' ; SECOND_TOKEN : 'Second token here'; NEW_LINE : ('\r'?'\n') ; WS : (' '|'\t'|'\u000C') {$channel=HIDDEN;} ;

当我将此语法输入此处的第一个标记\ n此处的第二个标记"时,它与second_rule相匹配.

When I give this grammar the input 'First token here\nSecond token here', it matches the second_rule.

我希望它与第一个规则匹配,然后与any_left_over_tokens匹配,因为first_rule出现在Rule_order_list的第二个规则之前,后者是起点.谁能解释为什么会这样?

I would have expected it to match the first rule then any_left_over_tokens because the first_rule appears before the second_rule in the rule_order_list which is the start point. Can anyone explain why this happens?

欢呼

推荐答案

首先,ANTLR的词法分析器将从上到下对输入进行标记化.因此,首先定义的令牌优先于其下面的令牌.如果规则的标记重叠,则匹配最多字符的规则将优先(贪婪匹配).

First of all, ANTLR's lexer will tokenize the input from top to bottom. So tokens defined first have a higher precedence than the ones below it. And in case rule have overlapping tokens, the rule that matches the most characters will take precedence (greedy match).

相同的原则在解析器规则中也适用.首先定义的规则也将首先匹配.例如,在规则foo中,子规则a将首先在b之前尝试:

The same principle holds within parser rules. Rules defined first will also be matched first. For example, in rule foo, sub-rule a will first be tried before b:

foo : a | b ;

请注意,在您的情况下,2 nd 规则不匹配,但尝试匹配,但由于没有尾随换行符而失败,并产生错误:

Note that in your case, the 2nd rule isn't matched, but tries to do so, and fails because there is no trailing line break, producing the error:

line 0:-1 mismatched input '<EOF>' expecting NEW_LINE

所以,根本没有匹配的东西.但是那很奇怪.由于您已经设置了backtrack=true,因此它至少应该回溯并匹配:

So, nothing is matched at all. But that is odd. Because you've set the backtrack=true, it should at least backtrack and match:

  • first_rule (此处的第一个令牌")
  • any_left_over_tokens (换行符")
  • any_left_over_tokens (此处是第二个令牌")
  • first_rule ("First token here")
  • any_left_over_tokens ("line-break")
  • any_left_over_tokens ("Second token here")
  • 如果不匹配first_rule首先,甚至不尝试匹配second_rule.

    if not match first_rule in the first place and not even try to match second_rule to begin with.

    在手动执行谓词(并在选项{...} 部分中禁用backtrack)时,快速演示如下:

    A quick demo when doing the predicates manually (and disabling the backtrack in the options { ... } section) would look like:

    grammar T; options { output=AST; //backtrack=true; memoize=true; } rule_list_in_order : ( (first_rule)=> first_rule {System.out.println("first_rule=[" + $first_rule.text + "]");} | (second_rule)=> second_rule {System.out.println("second_rule=[" + $second_rule.text + "]");} | any_left_over_tokens {System.out.println("any_left_over_tokens=[" + $any_left_over_tokens.text + "]");} )+ ; first_rule : FIRST_TOKEN ; second_rule : FIRST_TOKEN NEW_LINE SECOND_TOKEN NEW_LINE ; any_left_over_tokens : NEW_LINE | FIRST_TOKEN | SECOND_TOKEN ; FIRST_TOKEN : 'First token here'; SECOND_TOKEN : 'Second token here'; NEW_LINE : ('\r'?'\n'); WS : (' '|'\t'|'\u000C') {$channel=HIDDEN;};

    可以通过以下类进行测试:

    which can be tested with the class:

    import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { String source = "First token here\nSecond token here"; ANTLRStringStream in = new ANTLRStringStream(source); TLexer lexer = new TLexer(in); CommonTokenStream tokens = new CommonTokenStream(lexer); TParser parser = new TParser(tokens); parser.rule_list_in_order(); } }

    产生预期的输出:

    first_rule=[First token here] any_left_over_tokens=[ ] any_left_over_tokens=[Second token here]

    请注意,使用以下内容无关紧要:

    Note that it doesn't matter if you use:

    rule_list_in_order : ( (first_rule)=> first_rule | (second_rule)=> second_rule | any_left_over_tokens )+ ;

    rule_list_in_order : ( (second_rule)=> second_rule // <--+--- swapped | (first_rule)=> first_rule // <-/ | any_left_over_tokens )+ ;

    ,两者都会产生预期的输出.

    , both will produce the expected output.

    所以,我的猜测是您可能已经找到了一个错误.

    So, my guess is that you may have found a bug.

    如果您想要一个明确的答案(Terence Parr在这里的访问频率比他在这里更高的频率),则可以尝试ANTLR邮件列表.

    Yout could try the ANTLR mailing-list, in case you want a definitive answer (Terence Parr frequents there more often than he does here).

    祝你好运!

    PS.我用ANTLR v3.2进行了测试

    PS. I tested this with ANTLR v3.2

    更多推荐

    Antlr规则优先级

    本文发布于:2023-11-25 10:31:09,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1629401.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:优先级   规则   Antlr

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!