不区分大小写的关键字匹配

编程入门 行业动态 更新时间:2024-10-10 21:33:40
本文介绍了不区分大小写的关键字匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我正在编写用于解析计算机语言的语法,可以与 解析::Eyapp.这是一个 Perl 包,可简化为常规语言编写解析器.它类似于 yacc 和其他 LALR 解析器生成器,但有一些有用的扩展,例如根据正则表达式定义标记.

I'm writing a grammar for parsing a computer language, that can be used with Parse::Eyapp. This is a Perl package that simplifies writing parsers for regular languages. It is similar to yacc and other LALR parser generators, but has some useful extensions, like defining tokens in terms of regular expressions.

我想解析的语言使用关键字来表示部分和描述控制流.它还支持用作数据占位符的标识符.标识符永远不能与关键字同名.

The language I want to parse uses keywords to denote sections and describe control flow. It also supports identifiers that serve as placeholders for data. An identifier can never have the same name as a keyword.

现在,棘手的部分来了:我需要将关键字与标识符分开,但它们可能看起来很相似,所以我需要一个正则表达式模式,它不区分大小写地匹配标识符,仅此而已.

Now, here comes the tricky part: I need to separate keywords from identifiers, but they may look similar, so I need a regular expression pattern that matches an identifier case-insensitively, and nothing else.

我想出的解决方案如下:

The solution I came up with is the following:

  • 每个关键字由以下形式的标记标识:/((?i)keyword)(?!\w)/
    • (?i) 将对以下子模式应用不区分大小写的匹配
    • (?!\w) 不接受关键字后的任何单词字符(a-z、0-9 等)
    • 这些字符将不会成为匹配的一部分
    • Each keyword is identified by a token of the following form: /((?i)keyword)(?!\w)/
      • (?i) will apply case-insensitive matching for the following subpattern
      • (?!\w) will not accept any word characters (a-z, 0-9, etc.) after the keyword
      • those characters will not be part of the match
      • 到目前为止,我提出的标记定义和部分语法运行良好,但仍有很多工作要做.不过,这不是我的问题.

        The token definitions and part of the grammar I came up with work well so far, but there is still a lot to do. However, that is not my question.

        我想问的是,我是否在正确的轨道上;是否有更好、更简单的正则表达式来匹配这些关键字?我是否应该停止并完全使用不同的语言解析方法?

        What I wanted to ask is, am I on the right track here; are there better, simpler regular expressions for matching those keywords? Should I stop and use a different approach for language parsing altogether?

        顺便说一下,使用分词器匹配整个字符串而不是单个字符的想法来自 Parse::Eyapp 文档.我首先从逐个字符的语法开始,但这种方法不是很优雅,而且似乎与解析器生成器的灵活性相矛盾.写起来也很麻烦.

        The idea of using the tokenizer to match whole strings instead of single characters came from the Parse::Eyapp documentation, by the way. I started with a character-by-character grammar first, but that approach wasn't very elegant and seems to contradict the flexible nature of the parser generator. It was very cumbersome to write, too.

        推荐答案

        如果你想解析一种语言,Marpa 也许更适合你.这是一个教程.您还可以使用 regexp 语法.

        If you would like to parse a language, Marpa maybe much better suited for you. Here's a tutorial. You could also use regexp grammars.

  • 更多推荐

    不区分大小写的关键字匹配

    本文发布于:2023-11-12 11:13:14,感谢您对本站的认可!
    本文链接:https://www.elefans.com/category/jswz/34/1581335.html
    版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
    本文标签:大小写   关键字

    发布评论

    评论列表 (有 0 条评论)
    草根站长

    >www.elefans.com

    编程频道|电子爱好者 - 技术资讯及电子产品介绍!