使用%glr

编程入门 行业动态 更新时间:2024-10-28 18:24:22
使用%glr-parser和%merge规则减少有害野牛(Faulty bison reduction using %glr-parser and %merge rules)

目前我正在尝试为VHDL构建一个解析器,它具有C ++ - Parsers必须面对的一些问题。 VHDL的无上下文语法产生一个解析林而不是一个解析树,因为它关于函数调用和数组订阅的含糊不清

foo := fun(3) + arr(5);

如果解析器带有一个符合语言的,类型感知的符号表,它可以用来在运行中解决模糊性,那么这个赋值只能被明确地解析。

我不想这样做,因为对于前面提到的语句,解析林不会呈指数级增长,而是线性增长,具体取决于函数调用和数组订阅的数量。

(当然,除了用语句来折磨解析器之外)

foo := fun(fun(fun(fun(fun(4)))));

由于野牛强迫用户只创建一个单独的解析树,我使用%merge属性递归收集所有子树,并在单例AST中的所谓AMBIG节点下添加这些子树。

结果看起来像这样 。

为了产生上述内容,我解析了令牌流“I = I(N);”。 我在parse.y文件中使用的语法的内容收集如下。 它试图类似VHDL的模糊部分:

start: prog ; /* I cut out every semantic action to make this text more readable */ prog: assignment ';' | prog assignment ';' ; assignment: 'I' '=' expression ; expression: function_call %merge <stmtmerge2> | array_indexing %merge <stmtmerge2> | 'N' ; function_call: 'I' '(' expression ')' | 'I' ; array_indexing: function_call '(' expression ')' %merge <stmtmerge> | 'I' '(' expression ')' %merge <stmtmerge> ;

可以在此github存储库中读取整个源代码。

现在,让我们来看看实际的问题。 正如您在上面生成的解析树中所看到的,节点FCALL1和ARRIDX1指的是相同的单个节点EXPR1,后者又指两次N1。

无论如何,这不应该发生,我不知道为什么。 相反应该有路径

FCALL1 -> EXPR2 -> N2 ARRIDX1 -> EXPR1 -> N1

你知道为什么bison重用上述节点吗?

我还在野牛的官方gnu邮件列表上写了一个bug报告,但没有回复这一点。 不幸的是,由于新的stackoverflow用户的restictions,我不能提供这个错误报告的链接...

Currently I'm trying to build a parser for VHDL which has some of the problems C++-Parsers have to face. The context-free grammar of VHDL produces a parse forest rather than a single parse tree because of it's ambiguity regarding function calls and array subscriptions

foo := fun(3) + arr(5);

This assignment can only be parsed unambiguous if the parser would carry around a hirachically, type-aware symbol table which it'd use to resolve the ambiguities somewhat on-the-fly.

I don't want to do this, because for statements like the aforementioned, the parse forest would not grow exponentially, but rather linear depending on the amount of function calls and array subscriptions.

(Except, of course, one would torture the parser with statements like)

foo := fun(fun(fun(fun(fun(4)))));

Since bison forces the user to just create one single parse-tree, I used %merge attributes to collect all subtrees recursively and added those subtrees under so called AMBIG nodes in the singleton AST.

The result looks like this.

In order to produce the above, I parsed the token stream "I=I(N);". The substance of the grammar I used inside the parse.y file, is collected below. It tries to resemble the ambiguous parts of VHDL:

start: prog ; /* I cut out every semantic action to make this text more readable */ prog: assignment ';' | prog assignment ';' ; assignment: 'I' '=' expression ; expression: function_call %merge <stmtmerge2> | array_indexing %merge <stmtmerge2> | 'N' ; function_call: 'I' '(' expression ')' | 'I' ; array_indexing: function_call '(' expression ')' %merge <stmtmerge> | 'I' '(' expression ')' %merge <stmtmerge> ;

The whole sourcecode can be read at this github repository.

And now, let's get down to the actual Problem. As you can see in the generated parse tree above, the nodes FCALL1 and ARRIDX1 refer to the same single node EXPR1 which in turn refers to N1 twice.

This, by all means, should not have happened and I don't know why. Instead there should be the paths

FCALL1 -> EXPR2 -> N2 ARRIDX1 -> EXPR1 -> N1

Do you have any idea why bison reuses the aforementioned nodes?

I also wrote a bugreport on the official gnu mailing list for bison, without a reply to this point though. Unfortunately, due to the restictions for new stackoverflow users, I can't provide no link to this bug report...

最满意答案

这种行为是预期的。

expression可以明确地减少,并且减少的值被包括该值的可能的模糊减少使用。 请记住,GLR与LR一样,是一个从左到右的解析器。 执行减少操作时,所有子减少都已发生。 效果与在右侧使用终端没有区别; 终端不会被人工复制,以便在使用它的模糊产品中产生不同的实例。

对于大多数人来说,这将是一个功能而不是一个错误,我并不是说这是一个笑话。 没有图形结构堆栈,GLR具有指数运行时。 如果你真的想在合并解析树时做一个共享AST节点的深层复制,你必须自己做,但我建议你找到一种方法来利用解析林真的是一个有向无环的事实。图而不是树; 你可能会利用缺乏重复的优势。

That behaviour is expected.

expression can be unambiguously reduced, and that reduced value is used by both possible ambiguous reductions which include the value. Remember that GLR, like LR, is a left-to-right parser. When a reduction action is executed, all of the child reductions have already happened. The effect is not different from the use of a terminal in a right-hand side; the terminal will not be artificially copied in order to produce different instances in the ambiguous productions which use it.

For most people, this would be a feature rather than a bug, and I don't mean that as a joke. Without the graph-structured stack, GLR has exponential run-time. If you really want to do a deep copy of shared AST nodes when you merge parse trees, you will have to do it yourself, but I suggest that you find a way to make use of the fact that the parse forest is really an directed acyclic graph rather than a tree; you will probably be able to take advantage of the lack of duplication.

更多推荐

本文发布于:2023-08-07 20:19:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1465906.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:glr

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!