awk match（）多个匹配(awk match() multiple matches)

编程入门行业动态更新时间:2024-10-24 19:27:23

我有以下内容：

echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine|awk '{match($0,/ZZ:Z[^ ]*/,m); print m[0], m[1]}'

遗憾的是只输出第一个条目（两个）：

ZZ:Z:mus.sup

在我看来，match（）函数无法在其数组中存储多个匹配项。除非我在这里找不到什么......？

如果情况确实如此，那么有人会建议使用基于awk的“匹配”替代方案来获得两个ZZ：Z条目。请注意，每次都不在同一列（！）位置 - 因此需要使用match（）函数。

这里的一般思想是在相同的awk命令中获得出现在已知列位置的一些值（例如col1，col2），以及一些位于未知索引列的值（基于其唯一签名“ZZ：Z”获取）。

此外，以下尝试 - 使用gensub（）也无法输出/打印两个ZZ：Z条目，并且只识别两个中的一个（和另一个在弃用的时候）...

echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine|awk '{val= gensub(/.*(ZZ:Z[^ ]*).*/,"\\1 \\2","g",$0);print val}'

在这种情况下的结果是：

ZZ:Z:cas.sup

但我希望得到结果：

ZZ:Z:mus.sup ZZ:Z:cas.sup

I have the following:

echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine|awk '{match($0,/ZZ:Z[^ ]*/,m); print m[0], m[1]}'

which unfortunately outputs only the first entry (out of two):

ZZ:Z:mus.sup

It looks to me that the match() function is incapable of storing more than one match into its array. Unless I'm missing here something...?

If this is indeed the case, would someone kindly suggest an awk-based 'matching' alternative that will allow to obtain the two ZZ:Z entries. Note, that these are NOT located each time at the same column(!) - hence the need of using the match() function.

The general idea here is to obtain at the same awk command some values that appear at known column positions (e.g. col1, col2), and some values (fetched based on their unique signature "ZZ:Z") that located at unknown indexed columns.

In addition, the following attempt - using gensub() also fails to output/print the two ZZ:Z entries, and identify only one of the two (and the other one upon deprecation of the reciprocal..)

echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine|awk '{val= gensub(/.*(ZZ:Z[^ ]*).*/,"\\1 \\2","g",$0);print val}'

the result in this case is:

ZZ:Z:cas.sup

but I'd like to have as result:

ZZ:Z:mus.sup ZZ:Z:cas.sup

最满意答案

你只是调用了错误的函数，你应该使用split() not match() ：

$ echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine| awk '{split($0,t,/ZZ:Z[^ ]*/,m); print m[1], m[2]}' ZZ:Z:mus.sup ZZ:Z:cas.sup

或者按照它们在输入中出现的顺序打印任意数量的事件：

$ echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine| awk '{split($0,t,/ZZ:Z[^ ]*/,m); for (i=1; i in m; i++) print m[i]}' ZZ:Z:mus.sup ZZ:Z:cas.sup

这使用GNU awk作为第4个arg来split（）就像你使用GNU awk为第3个arg来匹配（）。

如果你必须在非GNU awk中执行此操作，它只是：

$ echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine| awk '{while(match($0,/ZZ:Z[^ ]*/)) {print substr($0,RSTART,RLENGTH); $0=substr($0,RSTART+RLENGTH)}}' ZZ:Z:mus.sup ZZ:Z:cas.sup

You were just calling the wrong function, you should be using split() not match():

$ echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine| awk '{split($0,t,/ZZ:Z[^ ]*/,m); print m[1], m[2]}' ZZ:Z:mus.sup ZZ:Z:cas.sup

or to print any number of occurrences in the order they appeared in the input:

$ echo AS:i:0 UQ:i:0 ZZ:Z:mus.sup NM:i:0 MD:Z:50 ZZ:Z:cas.sup CO:Z:endOfLine| awk '{split($0,t,/ZZ:Z[^ ]*/,m); for (i=1; i in m; i++) print m[i]}' ZZ:Z:mus.sup ZZ:Z:cas.sup

That uses GNU awk for the 4th arg to split() just like you were using GNU awk for the 3rd arg to match().

If you had to do this in a non-GNU awk it'd just be:

更多推荐

本文发布于:2023-08-07 21:45:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1466429.html