正则表达式来解析日志文件(Regular expression to parse log file)

编程入门 行业动态 更新时间:2024-10-09 22:19:16
正则表达式来解析日志文件(Regular expression to parse log file)

我有一个SonicWall syslog文件,格式如下:

<134>id=firewall sn=C0EAE470F7D0 time="2014-08-13 04:31:27" fw=10.2.3.4 pri=6 c=1024 m=537 msg="Connection Closed" n=301541 src=172.16.1.43:50581:X0 dst=172.16.1.1:192:X0 proto=udp/192 sent=46

我正在尝试创建一个正则表达式,它将返回在=符号上拆分的元组列表。 如果值包含空格,则它将具有双引号。 我不关心返回的值是否返回引号,只要返回带有空格的整个值。 例如,我希望时间键包含日期和时间。 期望的输出:

("<134>id","firewall"), ("sn","C0EAE470F7D0"), ("time", '"2014-08-13 04:31:27"') ("fw","1.2.3.4"), ("pri","6"), ... ("msg", '"Connection Closed"'), ("n", "301541") ("src","172.16.1.43:50581:X0"), ... ("sent", "46")

这是我到目前为止所遇到的,但遇到带双引号的字段时失败。 此外,不返回在这种情况下“发送”的最后一个字段。 我已经尝试了RE几个小时尝试各种组合,但只是不能让它工作。 任何帮助将不胜感激。

import re fname = "syslog.log" with open(fname) as fp: lines = fp.read().splitlines() q = re.compile('(.*?)=(.*?)[\s"]',re.S|re.M) for line in lines: print(line) key_val = q.findall(line) print(key_val)

这是此代码返回的内容:

[('<134>id', 'firewall'), ('sn', 'C0EAE470F7D0'), ('time', ''), ('2014-08-13 04:31:27" fw', '10.2.3.4'), ('pri', '6'), ('c', '1024'), ('m', '537'), ('msg', ''), ('Connection Closed" n', '301541'), ('src', '172.16.1.43:50581:X0'), ('dst', '172.16.1.1:192:X0'), ('proto', 'udp/192')]

如果使用正则表达式无法实现这一点,那么在Python 3.3中实现所需结果的最佳方法是什么?

I have a SonicWall syslog file with this format:

<134>id=firewall sn=C0EAE470F7D0 time="2014-08-13 04:31:27" fw=10.2.3.4 pri=6 c=1024 m=537 msg="Connection Closed" n=301541 src=172.16.1.43:50581:X0 dst=172.16.1.1:192:X0 proto=udp/192 sent=46

I am trying to create a regular expression that will return a list of tuples that are split on the = sign. If a value contains spaces, it will have double quotes. I don't care if the values returned have the quotes returned or not, as long as the entire value with spaces is returned. For example, I want the time key to contain both the date & time. Desired output:

("<134>id","firewall"), ("sn","C0EAE470F7D0"), ("time", '"2014-08-13 04:31:27"') ("fw","1.2.3.4"), ("pri","6"), ... ("msg", '"Connection Closed"'), ("n", "301541") ("src","172.16.1.43:50581:X0"), ... ("sent", "46")

This is what I have so far, but fails when a field with double quotes is encountered. Also, the last field, "sent" in this case, is not returned. I have experimented with the RE for a few hours trying various combinations, but just can't quite get this to work. Any help would be greatly appreciated.

import re fname = "syslog.log" with open(fname) as fp: lines = fp.read().splitlines() q = re.compile('(.*?)=(.*?)[\s"]',re.S|re.M) for line in lines: print(line) key_val = q.findall(line) print(key_val)

This is what this code returns:

[('<134>id', 'firewall'), ('sn', 'C0EAE470F7D0'), ('time', ''), ('2014-08-13 04:31:27" fw', '10.2.3.4'), ('pri', '6'), ('c', '1024'), ('m', '537'), ('msg', ''), ('Connection Closed" n', '301541'), ('src', '172.16.1.43:50581:X0'), ('dst', '172.16.1.1:192:X0'), ('proto', 'udp/192')]

If this can't be accomplished with a regular expression, what would be the best way to achieve the desired result in Python 3.3?

最满意答案

http://regex101.com/r/wS5lX2/3

(.+?)=("[^"]*"|\S*)\s*

它能做什么

将任何不等于等号的东西与等号匹配 匹配 围绕不包含引号或字符串的字符串的引号 没有空格的字符串 匹配空白

如果您还想删除匹配项周围的引号,则可以使用此代码

http://regex101.com/r/wS5lX2/4

(.+?)=(?:"(.*?)(?<!\\)"|(\S*))\s*

它从匹配字符串中删除双引号。 键将是组1,值将是组2或3.此外,它允许您在引用值中包含反斜杠转义引号。

http://regex101.com/r/wS5lX2/3

(.+?)=("[^"]*"|\S*)\s*

What it does

Match anything that's not an equals sign up to the equals sign Match either Quotes around a string that does not contains quotes or A string without spaces Match whitespace

If you additionally want to remove the quotes around the match, you can use this instead

http://regex101.com/r/wS5lX2/4

(.+?)=(?:"(.*?)(?<!\\)"|(\S*))\s*

It removes the double quote from the match string. The key will be group 1 and the value will be group 2 or 3. Additionally, it allows you to have backslash-escape quotes inside your quoted value.

更多推荐

本文发布于:2023-08-04 20:54:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1421488.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:文件   日志   正则表达式   Regular   file

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!