如何让awk不跳过空列？(How to make awk not to skip empty columns?)

编程入门行业动态更新时间:2024-10-28 08:16:01

给定这个input_file：

1234 1234 abcd 1234 abcd

当我运行时，awk无法识别空列：

awk '{print $1,$2}' input_file

我明白了：

1234 1234 1234 abcd

如何让awk给我：

1234 1234 1234

Given this input_file:

1234 1234 abcd 1234 abcd

awk doesn't recognise an empty column, when I run:

awk '{print $1,$2}' input_file

I get:

1234 1234 1234 abcd

How to make awk to give me:

1234 1234 1234

最满意答案

awk程序通常使用字段分隔符来确定哪些字符属于哪些字段。如果您的第二行只包含空格，则无法使用该方法根据需要进行拆分。

但是，GNU awk允许您设置一个更适合固定宽度数据的FIELDWIDTHS变量，因为这似乎是您所拥有的：

pax> cat infile 1234 5678 abcd 1234 abcd pax> awk 'BEGIN{FIELDWIDTHS="4 1 4"}{print "<"$1","$3">"}' infile <1234,5678> <1234, >

在这种情况下它是第一和第三，因为第二个字段是第一个和第二个实列之间的空格：

1234 5678 abcd \__/|\__/|\__/ 1 2 3 4 5

我通常这样做，因为我不希望空间成为数据的一部分（如果我想要输出中的不同字符，如我的例子），但是，如果你正在转移空间，你也可以使用简单：

pax> awk 'BEGIN{FIELDWIDTHS="5 4"}{print "<"$1$2">"}' infile <1234 5678> <1234 >

在这种情况下，字段1是五个字符1234<space> 。

如果要进行固定宽度处理但能够轻松适应以后的宽度更改，则可以修改awk脚本，以便从文件本身获取该信息。

不是来自实际数据行，因为字段可能有空格，但您可以添加标题行以完全指定要使用的宽度（确保标题行当然不被视为数据）。

以下脚本显示了这一点（ awk脚本现在在文件中，因为它变得复杂）：

pax> cat infile #### ###### #### 1234 567890 abcd 1234 abcd pax> cat awkfile.awk NR == 1 { # Header: construct field widths string # "a 1 b 1 c 1 d ... z" # where a..z are lengths of fields. FIELDWIDTHS = length($1) for (i = 2; i < NF; i++) { FIELDWIDTHS = FIELDWIDTHS" 1 "length($i) } next } { # Then use that FIELDWIDTHS string for # all other records. print "<"$1","$3">" } pax> awk -f awkfile.awk infile <1234,567890> <1234, >

您会发现可以根据需要更改字段长度，如果标题行正确，它将适应。

The awk program usually uses field separators to decide what characters belong in what fields. If your second line contains only spaces, there's no way to use that method to split as you wish.

However, GNU awk allows you to set a FIELDWIDTHS variable which will better suit fixed-width data, since that appears to be what you have:

pax> cat infile 1234 5678 abcd 1234 abcd pax> awk 'BEGIN{FIELDWIDTHS="4 1 4"}{print "<"$1","$3">"}' infile <1234,5678> <1234, >

It's field one and three in this case since field two is the space between the first and second real column:

1234 5678 abcd \__/|\__/|\__/ 1 2 3 4 5

I usually do that since I don't want the space to become part of the data (in case I want a different character in the output as in my example) but, if you're transferring the space anyway, you could also use the simpler:

pax> awk 'BEGIN{FIELDWIDTHS="5 4"}{print "<"$1$2">"}' infile <1234 5678> <1234 >

In that case, field 1 is the five characters 1234<space>.

If you want to do fixed width processing but with the ability to easily adapt to later width changes, you can modify the awk script so it gets that information from the file itself.

Not from the actual data lines since the fields there may have spaces, but you can add a header line to fully specify the widths to use (ensuring the header line isn't treated as data of course).

The following transcript shows this in action (the awk script is now in a file since it's getting complex):

You'll find that you can change the field lengths as much as you want and, provided the header line is correct, it will adapt.

更多推荐

本文发布于:2023-07-19 20:49:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1186785.html