antlr4 python目标无法识别unicode(antlr4 python target cannot recognize unicode)

编程入门 行业动态 更新时间:2024-10-27 11:27:00
antlr4 python目标无法识别unicode(antlr4 python target cannot recognize unicode)

我有一个ID终结者

ID : ([A-Z_]|'\u0100'..'\uFFFE') ([A-Z_0-9]|'\u0100'..'\uFFFE')*;

和一个.txt示例文件来解析

均60:=MA(C,60);

我生成了Java和Python2目标,并分别针对样本文件进行测试。 Java目标可以解析这个文件。 但是Python2目标不能。 它会token recognition error at: '均'抛出token recognition error at: '均' 。 我测试了其他有效输入的Python2目标,除了包含unicode字符的所有作品。 我错过了什么或python目标不支持unicode解析。

java的

mkdir -p java java -jar /usr/local/lib/antlr-4.5.3-complete.jar TDX.g4 -o ./java cd ./java javac TDX*.java java org.antlr.v4.gui.TestRig TDX prog -gui ../samples/1.txt

python目标生成命令

java -jar /usr/local/lib/antlr-4.5.3-complete.jar -Dlanguage=Python2 TDX.g4 -o ./tdx_py/antlrgen -visitor

python代码

import sys from antlr4 import * from tdx_py.antlrgen import TDXLexer, TDXParser def executefile(file): input = FileStream(file, encoding='utf-8') lexer = TDXLexer(input) stream = CommonTokenStream(lexer) parser = TDXParser(stream) tree = parser.prog() if __name__ == '__main__': executefile(sys.argv[1])

I have a ID terminator

ID : ([A-Z_]|'\u0100'..'\uFFFE') ([A-Z_0-9]|'\u0100'..'\uFFFE')*;

and a .txt sample file to parse

均60:=MA(C,60);

I generated Java and Python2 target and test each against sample file respectively. Java target can parse this file. But Python2 target can't. It throws token recognition error at: '均'. And I tested Python2 target against other valid inputs, all works except which contains unicode characters. Did I miss something or python target don't support unicode parsing.

java

mkdir -p java java -jar /usr/local/lib/antlr-4.5.3-complete.jar TDX.g4 -o ./java cd ./java javac TDX*.java java org.antlr.v4.gui.TestRig TDX prog -gui ../samples/1.txt

python target generating command

java -jar /usr/local/lib/antlr-4.5.3-complete.jar -Dlanguage=Python2 TDX.g4 -o ./tdx_py/antlrgen -visitor

python code

import sys from antlr4 import * from tdx_py.antlrgen import TDXLexer, TDXParser def executefile(file): input = FileStream(file, encoding='utf-8') lexer = TDXLexer(input) stream = CommonTokenStream(lexer) parser = TDXParser(stream) tree = parser.prog() if __name__ == '__main__': executefile(sys.argv[1])

最满意答案

这是ANTLR4的一个错误。 参考https://github.com/antlr/antlr4/issues/1925

This is a bug of ANTLR4. Reference https://github.com/antlr/antlr4/issues/1925

更多推荐

本文发布于:2023-07-28 04:30:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1300363.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:无法识别   目标   python   target   unicode

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!