Linux中提取模式字符串和其他模式字符串的简短方法是什么?(What's a short way in Linux to extract pattern string and anothe

编程入门 行业动态 更新时间:2024-10-19 10:21:45
Linux中提取模式字符串和其他模式字符串的简短方法是什么?(What's a short way in Linux to extract pattern string and another pattern string later?)

假设我们在文件中存储了一行文本:

// In the actual file this will be one line {unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3}, {other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600}, {ID:30,more_unrelated_text1,TIMESTAMP:1476280700}, {ID:40,final_unrelated_text}

我想要的是这个特定的输入提取3条目

// The details, such as whether to put { character in front or not do not matter. // Any form of output which extracts only these 3 entries and groups them in a // visually nice way will do the job. {ID:13, TIMESTAMP:1476280500} {ID:25, TIMESTAMP:1476280600} {ID:30, TIMESTAMP:1476280700} // I do not want the last entry, because it does not contain timestamp field.

到目前为止,我找到的最接近的命令是

grep -Po {ID:[0-9]+(.+?)} input_file

它给出了输出

{unrelated_text1,ID:13,unrelated_text2,TIMESTAMP:1476280500,unrelated_text3} {other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600} {ID:30,more_unrelated_text1,TIMESTAMP:1476280700} {ID:40,final_unrelated_text}

我要搜索的下一个改进是如何从每个条目中删除unrelated_text并删除最后一个条目。

问题 :在Linux中最简单的方法是什么?

Suppose we have one line of text stored in a file:

// In the actual file this will be one line {unrelated_text1,ID:13, unrelated_text2,TIMESTAMP:1476280500,unrelated_text3}, {other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600}, {ID:30,more_unrelated_text1,TIMESTAMP:1476280700}, {ID:40,final_unrelated_text}

What I want is for this particular input extract 3 entries:

// The details, such as whether to put { character in front or not do not matter. // Any form of output which extracts only these 3 entries and groups them in a // visually nice way will do the job. {ID:13, TIMESTAMP:1476280500} {ID:25, TIMESTAMP:1476280600} {ID:30, TIMESTAMP:1476280700} // I do not want the last entry, because it does not contain timestamp field.

So far the closest command I found is

grep -Po {ID:[0-9]+(.+?)} input_file

which gives the output

{unrelated_text1,ID:13,unrelated_text2,TIMESTAMP:1476280500,unrelated_text3} {other_unrelated_text1,other_unrelated_text2,ID:25,TIMESTAMP:1476280600} {ID:30,more_unrelated_text1,TIMESTAMP:1476280700} {ID:40,final_unrelated_text}

The next improvement I am searching for is how to remove unrelated_text from each entry and also remove the last entry.

Question: what's the shortest way to do that in Linux?

最满意答案

使用GNU awk实现多字符RS和RT和字边界:

$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file {ID:13, TIMESTAMP:1476280500} {ID:25, TIMESTAMP:1476280600} {ID:30, TIMESTAMP:1476280700}

无论输入是在一行还是多行,无论文件中的其他文本是什么,上述内容都可以正常工作,它依赖的是每个相关TIMESTAMP之前出现的ID,如果有必要,不难改变。

With GNU awk for multi-char RS and RT and word boundaries:

$ awk -v RS='\\<(ID|TIMESTAMP):[0-9]+' 'NR%2{id=RT;next} RT{printf "{%s, %s}\n", id, RT}' file {ID:13, TIMESTAMP:1476280500} {ID:25, TIMESTAMP:1476280600} {ID:30, TIMESTAMP:1476280700}

The above will work no matter if the input is on one line or multiple lines and no matter what other text you have in the file, all it relies on is the ID appearing before each related TIMESTAMP and that's not hard to change if necessary.

更多推荐

本文发布于:2023-07-26 00:21:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1268375.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:字符串   模式   简短   方法   Linux

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!