试图解析自由格式的ANSI文本。

编程入门 行业动态 更新时间:2024-10-07 12:18:07
本文介绍了试图解析自由格式的ANSI文本。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

好吧......我正在尝试从 telnet应用程序中找到解析ANSI文本的方法。但是,我遇到了一些麻烦。 我想要做的是从输出中删除所有ANSI序列_removed_ 除了那些管理的颜色代码或文字显示(简而言之, 那些是ESC [#m(附加#s分隔;字符)。 留下的,那些是颜色代码,我想 行动,并从文本流中删除,并显示文本。 我正在使用wxPython '的TextCtrl作为输出,所以当我获取一个ANSI颜色 控制序列时,我想基本把它变成对wxWidgets的调用'' TextCtrl控件的.SetDefaultStyle方法,将相应的 颜色/亮度/斜体/粗体等设置添加到TextCtrl,直到 下一个ANSI代码进来改变它。 这看起来很简单,但我似乎无法理解这个想法。 : - / 我有一个来源t在 fd0man.theunixplace/Tmud.tar 上进行宣传 包含有问题的代码。简而言之,信息是通过传统上连接到 telnet客户端的TCP / IP套接字进入的,因此可以在中线(或者甚至控制中期 序列)。如果有人对我正在做的事情有什么想法,期待,或者假设这是错误的话,我会很高兴听到它。 的代码表现不像我预期的那样是在src / AnsiTextCtrl.py中,但我 已包含整个项目,因为它代表完整性。 任何帮助将不胜感激!谢谢! - 迈克

解决方案

Michael B. Trausch写道:

好​​吧......我试图找到一种从 telnet应用程序解析ANSI文本的方法。但是,我遇到了一些麻烦。 我想要做的是从输出中删除所有ANSI序列_removed_ 除了那些管理的颜色代码或文字显示(简而言之, 那些是ESC [#m(附加#s分隔;字符)。 留下的,那些是颜色代码,我想 行动,并从文本流中删除,并显示文本。 我正在使用wxPython '的TextCtrl作为输出,所以当我获取一个ANSI颜色 控制序列时,我想基本把它变成对wxWidgets的调用'' TextCtrl控件的.SetDefaultStyle方法,将相应的 颜色/亮度/斜体/粗体等设置添加到TextCtrl,直到 下一个ANSI代码进来改变它。 这看起来很简单,但我似乎无法理解这个想法。 : - / 我有源焦油在 fd0man.theunixplace/Tmud.tar 上打球 包含有问题的代码。简而言之,信息是通过传统上连接到 telnet客户端的TCP / IP套接字进入的,因此可以在中线(或者甚至控制中期 序列)。如果有人对我正在做的事情有什么想法,期待,或者假设这是错误的话,我会很高兴听到它。 的代码表现不像我预期的那样是在src / AnsiTextCtrl.py中,但我 已包含整个项目,因为它代表完整性。 任何帮助将不胜感激!谢谢! - Mike

*我没有从TCP / IP读取的经验。但是以坦率的心态看着你的程序,我会说它是为了处理一大块内存中的数据。如果,如你所说,你从TCP / IP获得的块可能会在任何地方开始和结束,大概是你通过每个块传递 AppendText,然后你有一个同步问题,因为每个调用都会重置 你的转义标志,即使新的块在 转义序列的中间开始。也许你应该在 结束时切断不完整的逃生并将它们添加到下一个块。 并且: if(len(buffer)0): wx.TextCtrl.AppendText(self,buffer)<<<你确定文字与控件在同一个地方吗? if(len(AnsiBuffer)0): wx.TextCtrl.AppendText(self,AnsiBuffer)<<<你说你想要 剥离控制序列 弗雷德里克 *

" Michael B. Trausch" <" mike

#at ^& nospam!%trauschus"写在留言中 新闻:Gs ********** ********************@comcast。 ..

好​​吧......我试图找到一种从 telnet应用程序解析ANSI文本的方法。但是,我遇到了一些麻烦。 我想要做的是从输出中删除所有ANSI序列_removed_ 除了那些管理的颜色代码或文字显示(简而言之, 那些是ESC [#m(附加#s分隔;字符)。 留下的,那些是颜色代码,我想 行动,并从文本流中删除,并显示文本。

这是一个基于pyparsing的扫描仪/转换器,以及结尾处的一些测试代码。它负责部分转义序列,并删除表格中的任何 序列 "< ESC> [##; ##; ...< alpha>",除非尾随的alpha是''m''。 pyparsing项目wiki位于 pyparsing.wikispaces 。 - Paul 来自pyparsing import * ESC = chr(27) escIntro = Literal(ESC +''['').suppress() 整数= Word (nums) colorCode =组合(escIntro + 可选(delimitedList(整数,delim ='';''))+ 抑制(''m''))。setResultsName(" colorCode") #define搜索模式将匹配非颜色ANSI命令 #代码 - 这些将被丢弃在地板上 otherAnsiCode =抑制(组合(escIntro + 可选)(delimitedList(整数,delim ='';'' ))+ oneOf(list(alphas)))) partialAnsiCode = Combine(Literal(ESC)+ 可选(''['')+ 可选(delimitedList(整数,delim ='';'')+ 可选('';''))+ StringEnd())。setResultsName(" partialCode") ansiSearchPattern = colorCode | otherAnsiCode | partialAnsiCode #保留传入文本中的标签 ansiSearchPattern.parseWithTabs() def processInputString(inputString): lastEnd = 0 for t,start,end in ansiSearchPattern.scanString(inputString): #pass inputString [lastEnd:start] to wxTextControl - font styles 在解析动作中设置 print inputString [lastEnd:start] #处理颜色代码,如果有的话: 如果t.getName()==" colorCode": if t: print"<将颜色属性更改为%s> " %t.asList() else: print"<检测到空颜色序列>" elif t.getName() ==" partialCode": print"<找到部分转义序列%s,将其粘贴在 next>>前面%t #返回部分代码,预先添加到下一个字符串 #发送到processInputString 返回t [0] else: #发现其他类型的ANSI代码,什么都不做 通过 lastEnd = end ##将inputString [lastEnd:]传递给wxTextControl - 这是最后一位 #最后一个转义序列后的输入字符串 print inputString [lastEnd:] test =""" \ 这是一个包含一些ANSI序列的测试字符串。 序列1:〜[10; 12m 序列2:〜[3; 4h 序列3:〜[4; 5m 序列4; 〜[m 序列5; 〜[24HN更多逃脱序列。 ~ [7"" .replace('''',chr(27)) leftOver = processInputString(test) 打印: 这是一个包含一些ANSI序列的测试字符串。 序列1: <将颜色属性更改为[''1012'']> 序列2: 序列3: <将颜色属性更改为[''45'']> 序列4; <将颜色属性更改为[ '''']> 序列5; 没有更多的转义序列。 < found部分转义序列[''\ x1b [7''],在下一个前面加上>

Alright... I am attempting to find a way to parse ANSI text from a telnet application. However, I am experiencing a bit of trouble. What I want to do is have all ANSI sequences _removed_ from the output, save for those that manage color codes or text presentation (in short, the ones that are ESC[#m (with additional #s separated by ; characters). The ones that are left, the ones that are the color codes, I want to act on, and remove from the text stream, and display the text. I am using wxPython''s TextCtrl as output, so when I "get" an ANSI color control sequence, I want to basically turn it into a call to wxWidgets'' TextCtrl.SetDefaultStyle method for the control, adding the appropriate color/brightness/italic/bold/etc. settings to the TextCtrl until the next ANSI code comes in to alter it. It would *seem* easy, but I cannot seem to wrap my mind around the idea. :-/ I have a source tarball up at fd0man.theunixplace/Tmud.tar which contains the code in question. In short, the information is coming in over a TCP/IP socket that is traditionally connected to with a telnet client, so things can be broken mid-line (or even mid-control sequence). If anyone has any ideas as to what I am doing, expecting, or assuming that is wrong, I would be delighted to hear it. The code that is not behaving as I would expect it to is in src/AnsiTextCtrl.py, but I have included the entire project as it stands for completeness. Any help would be appreciated! Thanks! -- Mike

解决方案

Michael B. Trausch wrote:

Alright... I am attempting to find a way to parse ANSI text from a telnet application. However, I am experiencing a bit of trouble. What I want to do is have all ANSI sequences _removed_ from the output, save for those that manage color codes or text presentation (in short, the ones that are ESC[#m (with additional #s separated by ; characters). The ones that are left, the ones that are the color codes, I want to act on, and remove from the text stream, and display the text. I am using wxPython''s TextCtrl as output, so when I "get" an ANSI color control sequence, I want to basically turn it into a call to wxWidgets'' TextCtrl.SetDefaultStyle method for the control, adding the appropriate color/brightness/italic/bold/etc. settings to the TextCtrl until the next ANSI code comes in to alter it. It would *seem* easy, but I cannot seem to wrap my mind around the idea. :-/ I have a source tarball up at fd0man.theunixplace/Tmud.tar which contains the code in question. In short, the information is coming in over a TCP/IP socket that is traditionally connected to with a telnet client, so things can be broken mid-line (or even mid-control sequence). If anyone has any ideas as to what I am doing, expecting, or assuming that is wrong, I would be delighted to hear it. The code that is not behaving as I would expect it to is in src/AnsiTextCtrl.py, but I have included the entire project as it stands for completeness. Any help would be appreciated! Thanks! -- Mike

*I have no experience with reading from TCP/IP. But looking at your program with a candid mind I''d say that it is written to process a chunk of data in memory. If, as you say, the chunks you get from TCP/IP may start and end anywhere and, presumably you pass each chunk through AppendText, then you have a synchronization problem, as each call resets your escape flag, even if the new chunk starts in the middle of an escape sequence. Perhaps you should cut off incomplete escapes at the end and prepend them to the next chunk. And: if(len(buffer) 0): wx.TextCtrl.AppendText(self, buffer) <<< Are you sure text goes into the same place as the controls? if(len(AnsiBuffer) 0): wx.TextCtrl.AppendText(self, AnsiBuffer) <<< You say you want to strip the control sequences Frederic *

"Michael B. Trausch" <"mike

#at^&nospam!%trauschus"wrote in message news:Gs******************************@comcast. ..

Alright... I am attempting to find a way to parse ANSI text from a telnet application. However, I am experiencing a bit of trouble. What I want to do is have all ANSI sequences _removed_ from the output, save for those that manage color codes or text presentation (in short, the ones that are ESC[#m (with additional #s separated by ; characters). The ones that are left, the ones that are the color codes, I want to act on, and remove from the text stream, and display the text.

Here is a pyparsing-based scanner/converter, along with some test code at the end. It takes care of partial escape sequences, and strips any sequences of the form "<ESC>[##;##;...<alpha>", unless the trailing alpha is ''m''. The pyparsing project wiki is at pyparsing.wikispaces. -- Paul from pyparsing import * ESC = chr(27) escIntro = Literal(ESC + ''['').suppress() integer = Word(nums) colorCode = Combine(escIntro + Optional(delimitedList(integer,delim='';'')) + Suppress(''m'')).setResultsName("colorCode") # define search pattern that will match non-color ANSI command # codes - these will just get dropped on the floor otherAnsiCode = Suppress( Combine(escIntro + Optional(delimitedList(integer,delim='';'')) + oneOf(list(alphas)) ) ) partialAnsiCode = Combine(Literal(ESC) + Optional(''['') + Optional(delimitedList(integer,delim='';'') + Optional('';'')) + StringEnd()).setResultsName("partialCode") ansiSearchPattern = colorCode | otherAnsiCode | partialAnsiCode # preserve tabs in incoming text ansiSearchPattern.parseWithTabs() def processInputString(inputString): lastEnd = 0 for t,start,end in ansiSearchPattern.scanString( inputString ): # pass inputString[lastEnd:start] to wxTextControl - font styles were set in parse action print inputString[lastEnd:start] # process color codes, if any: if t.getName() == "colorCode": if t: print "<change color attributes to %s>" % t.asList() else: print "<empty color sequence detected>" elif t.getName() == "partialCode": print "<found partial escape sequence %s, tack it on front of next>" % t # return partial code, to be prepended to the next string # sent to processInputString return t[0] else: # other kind of ANSI code found, do nothing pass lastEnd = end # # pass inputString[lastEnd:] to wxTextControl - this is the last bit # of the input string after the last escape sequence print inputString[lastEnd:] test = """\ This is a test string containing some ANSI sequences. Sequence 1: ~[10;12m Sequence 2: ~[3;4h Sequence 3: ~[4;5m Sequence 4; ~[m Sequence 5; ~[24HNo more escape sequences. ~[7""".replace(''~'',chr(27)) leftOver = processInputString(test) Prints: This is a test string containing some ANSI sequences. Sequence 1: <change color attributes to [''1012'']> Sequence 2: Sequence 3: <change color attributes to [''45'']> Sequence 4; <change color attributes to ['''']> Sequence 5; No more escape sequences. <found partial escape sequence [''\x1b[7''], tack it on front of next>

更多推荐

试图解析自由格式的ANSI文本。

本文发布于:2023-11-28 08:12:23,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1641617.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文本   格式   自由   ANSI

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!