LR(0)和SLR解析有什么区别?(What is the difference between LR(0) and SLR parsing?)

系统教程 行业动态 更新时间:2024-06-14 16:57:40
LR(0)和SLR解析有什么区别?(What is the difference between LR(0) and SLR parsing?)

我正在处理我的编译器概念,但我有点困惑...谷歌让我无所适从的答案。

SLR和LR(0)解析器是一样的吗? 如果没有,有什么区别?

I am working on my compilers concepts however I am a little confused... Googling got me nowhere to a definite answer.

Is SLR and LR(0) parsers one and same? If not, whats the difference?

最满意答案

LR(0)和SLR(1)解析器都是自下而上的方向性预测解析器 。 这意味着

解析器尝试反向应用生成,将输入句减少到起始符号( 自下而上 ) 解析器从左到右扫描输入( 方向 ) 解析器试图预测什么减少应用,而不一定看到所有的输入( 预测

LR(0)和SLR(1)都是移位/减少解析器 ,这意味着它们通过将输入流放置在堆栈上来处理令牌,并且在每个点处,通过令牌推入堆栈或减少一些终端和非终结点的序列返回到一些非终结符号。 可以看出,任何语法都可以使用shift / reduce解析器从自下而上解析,但该解析器可能不是确定性的 。 也就是说,解析器可能必须“猜测”是否应用转移或减少,并且可能最终不得不回溯到实现错误的选择。 无论确定性的转换/减少解析器的构造有多么强大,它将无法解析所有的语法。

当使用确定性的移位/减少解析器来解析不能处理的语法时,会导致移位/减少冲突减少/减少冲突 ,解析器可能进入无法指定采取什么行动的状态。 在移位/减少冲突中,它不能判断是否应该向堆栈添加另一个符号,或者对堆栈的顶部符号执行一些减少。 在减少/减少冲突中,解析器知道它需要用一些非终结符代替堆栈的顶部符号,但是它不能说明要使用的还原。

如果这是一个漫长的论述,我很抱歉,但是我们需要这样才能解决LR(0)和SLR(1)解析之间的区别。 LR(0)解析器是一个移位/减少解析器,它使用零令牌的前瞻来确定采取什么动作(因此为0)。 这意味着在解析器的任何配置中,解析器必须有一个明确的动作来选择 - 它会转移特定的符号或应用特定的缩减。 如果有两个或更多的选择,解析器失败,我们说语法不是LR(0)。

回想一下,两个可能的LR冲突是转移/减少和减少/减少。 在这两种情况下,LR(0)自动机都可以采取至少两个动作,不能区分哪一个。 由于至少有一个冲突的行为是减少的,所以合理的攻击路线是尝试使解析器在执行特定的减少时更加小心。 更具体地说,假设解析器被允许查看输入的下一个标记来确定它是应该移位还是减少。 如果我们只允许解析器在“有意义”的情况下减少(对于“有意义”的一些定义),那么我们可能能够通过使自动机专门选择在一个或多个特别的步骤。

在SLR(1)(“简化LR(1)”)中,解析器在决定是否应该移动或缩小时,允许查看前瞻性的一个令牌。 特别地,当解析器想要尝试减少A→W形式的东西(对于非终结符A和字符串w)时,它会查看输入的下一个标记。 如果在某些派生中该令牌可以合法地出现在非终结点A之后,解析器就会减少。 否则,它不会。 这里的直觉是,在某些情况下,尝试减少是没有意义的,因为考虑到我们迄今为止所看到的令牌和即将到来的令牌,没有可能的方法是减少可能是正确的。

LR(0)和SLR(1)之间的唯一区别是这种额外的能力,有助于决定在发生冲突时采取什么行动。 因此,可以由LR(0)解析器解析的任何语法都可以由SLR(1)解析器解析。 然而,SLR(1)解析器可以解析比LR(0)更多的语法。

实际上,SLR(1)仍然是一个相当弱的解析方法。 更常见的是,您将看到使用的LALR(1)(“Lookahead LR(1)”)解析器。 他们也通过尝试解决LR(0)解析器中的冲突来工作,但是它们用于解决冲突的规则比SLR(1)中使用的规则要精确得多,因此大量的语法是LALR(1)比单反(1)。 要更具体地说,SLR(1)解析器试图通过查看语法的结构来解决冲突,以了解有关什么时候转移和何时减少的更多信息。 LALR(1)解析器查看语法和LR(0)解析器,以获得关于何时移位和何时减少的更具体的信息。 因为LALR(1)可以查看LR(0)解析器的结构,所以它可以更精确地识别某些冲突是否是假的。 默认情况下,Linux实用程序yacc和bison生成LALR(1)解析器。

历史上,LALR(1)解析器通常通过依赖于更强大的LR(1)解析器的不同方法构建,因此您将经常看到LALR(1)。 要理解这一点,我们需要谈谈LR(1)解析器。 在LR(0)解析器中,解析器通过跟踪生产中间可能位于哪里。 一旦发现已经到达生产的终点,它就知道要尝试减少。 然而,解析器可能无法知道它是在一个生产的结束和另一个生产的中间,这导致转移/减少冲突,或者两个不同的生产中哪一个已经到了(一个减少/减少冲突)。 在LR(0)中,这会立即导致冲突,解析器失败。 在SLR(1)或LALR(1)中,解析器然后基于下一个前瞻标记来做出移位或减少的决定。

在LR(1)解析器中,解析器在运行时跟踪附加信息。 除了跟踪解析器相信使用的生产之外,还会跟踪生产完成后可能出现的令牌。 因为解析器在每个步骤跟踪这个信息,而不仅仅是当需要做出决定时,LR(1)解析器比LR(0),SLR(1)或LALR(1)我们已经谈到的解析器到目前为止。 LR(1)是一个非常强大的解析技术,它可以使用一些棘手的数学表示,任何可以通过任何 shift / reduce解析器确定性地解析的语言都有一些语法,可以用LR(1)自动机解析。 (请注意,这并不意味着所有可以解析的语法都是LR(1);这只能说可以解析的语言具有一些LR(1)语法)。 然而,这种权力是有代价的,并且生成的LR(1)解析器可能需要这么多信息来操作,这在实践中不可能被使用。 例如,用于真正编程语言的LR(1)解析器可能需要数十到数百兆的附加信息才能正常运行。 因此,LR(1)在实践中通常不被使用,而使用较弱的解析器,如LALR(1)或SLR(1)。

最近,一种称为GLR(0)(“Generalized LR(0)”)的新解析算法已经普及。 GLR(0)解析器不是试图解决出现在LR(0)解析器中的冲突,而是通过并行尝试所有可能的选项来工作。 使用一些聪明的技巧,这可以使许多语法运行得非常有效。 此外,GLR(0)可以解析任何上下文无关的语法 ,甚至是任何k的LR(k)解析器无法解析的语法。 其他解析器也能够这样做(例如,Earley解析器或CYK解析器),尽管GLR(0)在实践中往往更快。

如果您有兴趣了解更多信息,那么在今年夏天,我教了一个介绍性的编译器课程,花了两个多星期的时间来谈论解析技术。 如果您想要对LR(0),SLR(1)和其他强大的解析技术进行更严格的介绍,您可以欣赏我的演讲幻灯片和关于解析的作业作业。 所有的课程材料都可以在我的个人网站上找到 。

希望这可以帮助!

Both LR(0) and SLR(1) parsers are bottom-up, directional, predictive parsers. This means that

The parsers attempt to apply productions in reverse to reduce the input sentence back to the start symbol (bottom-up) The parsers scan the input from left-to-right (directional) The parsers attempt to predict what reductions to apply without necessarily seeing all of the input (predictive)

Both LR(0) and SLR(1) are shift/reduce parsers, meaning that they process the tokens of the input stream by placing them on a stack, and at each point either shifting a token by pushing it onto the stack or reducing some sequence of terminals and nonterminals atop the stack back to some nonterminal symbol. It can be shown that any grammar can be parsed bottom-up using a shift/reduce parser, but that parser might not be deterministic. That is, the parser may have to "guess" whether to apply a shift or reduction, and may end up having to backtrack to realize that it made the wrong choice. No matter how powerful a deterministic shift/reduce parser you construct, it will never be able to parse all grammars.

When a deterministic shift/reduce parser is used to parse a grammar that it cannot handle, it results in shift/reduce conflicts or reduce/reduce conflicts, where the parser may enter a state in which it cannot tell what action to take. In a shift/reduce conflict, it cannot tell whether it should add another symbol to the stack or perform some reduction on the top symbols of the stack. In a reduce/reduce conflict, the parser knows that it needs to replace the top symbols of the stack with some nonterminal, but it can't tell what reduction to use.

I apologize if this is a lengthy exposition, but we need this to be able to address the difference between LR(0) and SLR(1) parsing. An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).

Recall that the two possible LR conflicts are shift/reduce and reduce/reduce. In both of these cases, there are at least two actions that the LR(0) automaton could be taking, and it can't tell which of them to use. Since at least one of the conflicting actions is a reduction, a reasonable line of attack would be to try to have the parser be more careful about when it performs a particular reduction. More specifically, let's suppose that the parser is allowed to look at the next token of input to determine whether it should shift or reduce. If we only allow the parser to reduce when it "makes sense" to do so (for some definition of "makes sense"), then we may be able to eliminate the conflict by having the automaton specifically choose to either shift or reduce in a particular step.

In SLR(1) ("Simplified LR(1)"), the parser is allowed to look at one token of lookahead when deciding whether it should shift or reduce. In particular, when the parser wants to try reducing something of the form A → w (for nonterminal A and string w), it looks at the next token of input. If that token could legally appear after the nonterminal A in some derivation, the parser reduces. Otherwise, it does not. The intuition here is that in some cases it makes no sense to attempt a reduction, because given the tokens we've seen so far and the upcoming token, there is no possible way that the reduction could ever be correct.

The only difference between LR(0) and SLR(1) is this extra ability to help decide what action to take when there are conflicts. Because of this, any grammar that can be parsed by an LR(0) parser can be parsed by an SLR(1) parser. However, SLR(1) parsers can parse a larger number of grammars than LR(0).

In practice, though, SLR(1) is still a fairly weak parsing method. More commonly, you will see LALR(1) ("Lookahead LR(1)") parsers being used. They too work by trying to resolve conflicts in an LR(0) parser, but the rules they use for resolving conflicts are far more precise than those used in SLR(1), and consequently a much larger number of grammars are LALR(1) than are SLR(1). To be a bit more specific, SLR(1) parsers try to resolve conflicts by looking at the structure of the grammar to learn more information about when to shift and when to reduce. LALR(1) parsers look at both the grammar and the LR(0) parser to get even more specific information about when to shift and when to reduce. Because LALR(1) can look at the structure of the LR(0) parser, it can more precisely identify when certain conflicts are spurious. The Linux utilities yacc and bison, by default, produce LALR(1) parsers.

Historically, LALR(1) parsers were typically constructed through a different method that relied on the far more powerful LR(1) parser, so you will often see LALR(1) described that way. To understand this, we need to talk about LR(1) parsers. In an LR(0) parser, the parser works by keeping track of where it might be in the middle of a production. Once it has found that it's reached the end of a production, it knows to try to reduce. However, the parser might not be able to tell whether it's in at the end of one production and the middle of another, which leads to a shift/reduce conflict, or which of two different productions it has reached the end of (a reduce/reduce conflict). In LR(0), this immediately leads to a conflict and the parser fails. In SLR(1) or LALR(1), the parser then makes the decision to shift or reduce based on the next token of lookahead.

In an LR(1) parser, the parser keeps track of additional information as it operates. In addition to keeping track of what production the parser believes is being used, it keeps track of what possible tokens might appear after that production is completed. Because the parser keeps track of this information at each step, and not just when it needs to make the decision, the LR(1) parser is substantially more powerful and precise than any of the LR(0), SLR(1), or LALR(1) parsers we've talked about so far. LR(1) is an extremely powerful parsing technique, and it can be shown using some tricky math that any language that could be parsed deterministically by any shift/reduce parser has some grammar that could be parsed with an LR(1) automaton. (Note that this does not mean that all grammars that can be parsed deterministically are LR(1); this only says that a language that could be parsed deterministically has some LR(1) grammar). However, this power comes at a price, and a generated LR(1) parser may require so much information to operate that it can't possibly be used in practice. An LR(1) parser for a real programming language, for example, might require tens to hundreds of megabytes of additional information to operate correctly. For this reason, LR(1) isn't typically used in practice, and weaker parsers like LALR(1) or SLR(1) are used instead.

More recently, a new parsing algorithm called GLR(0) ("Generalized LR(0)") has gained popularity. Rather than trying to resolve the conflicts that appear in an LR(0) parser, the GLR(0) parser instead works by trying all possible options in parallel. Using some clever tricks, this can be made to run very efficiently for many grammars. Moreover, GLR(0) can parse any context-free grammar at all, even grammars that can't be parsed by an LR(k) parser for any k. Other parsers are capable of doing this as well (for example, the Earley parser or a CYK parser), though GLR(0) tends to be faster in practice.

If you're interested in learning more, over this summer I taught an introductory compilers course and spent just under two weeks talking about parsing techniques. If you'd like to get a more rigorous introduction to LR(0), SLR(1), and a host of other powerful parsing techniques, you might enjoy my lecture slides and homework assignments about parsing. All of the course materials are available here on my personal site.

Hope this helps!

更多推荐

本文发布于:2023-04-13 12:25:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/903fde8d83de5844c9784ba577e13bdf.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:有什么区别   SLR   LR   difference   parsing

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!