我有我总额高达两方面的CSV文件:一是使用Excel,另一个使用 AWK 。这里是我的第一个8列的总数在Excel中:
I have a CSV file that I'm totaling up two ways: one using Excel and the other using awk. Here are the totals of my first 8 columns in Excel:
1) 2640502474.00 2) 1272849386284.00 3) 36785.00 4) 5) 107.00 6) 239259.00 7) 0.00 8) 7418570893330.00这是我的 AWK 输出:
$ cat /home/jason/import.csv | awk -F "\"*,\"*" '{s+=$1} END {printf("%01.2f\n", s)}' 2640502474.00 $ cat /home/jason/import.csv | awk -F "\"*,\"*" '{s+=$2} END {printf("%01.2f\n", s)}' 1272849386284.00 $ cat /home/jason/import.csv | awk -F "\"*,\"*" '{s+=$8} END {printf("%01.2f\n", s)}' 7411306364347.00请注意如何1和2严丝合缝,但8是关闭的几百万。我假定Excel的总是正确的,那么,为什么 AWK 不同处理此文件?
推荐答案您可能已经包含在报价逗号格式化数字。 Excel将妥善处理该数字作为一个单独的领域。您在AWK场分离的正则表达式不会 - 内部一个数字一个逗号是根据该正则表达式有效的分隔符。这是很难(而且大多徒劳的),尝试和处理可选嵌套逃逸喜欢什么是可能以CSV正则表达式。
You likely have a comma formatted number contained in quotes. Excel will properly handle that number as a single field. Your regex for field separation in awk won't - a comma internal to a number is a valid separator according to that regex. It is very hard (and mostly futile) to try and handle optional nested escaping like what is possible in csv with a regex.
比较下面,看看有什么是可能的事情:
Compare the following to see what is likely going on:
$ echo '"1","10","15","1,000","14"' | awk -F "\"*,\"*" '{print $4}' 1 $ echo '"1","10","15","1,000","14"' | awk -F "\",\"" '{print $4}' 1,000请注意,上面仍然是第二正则表达式与在最后一个字段尾随问题,只有在所有工作,如果所有领域都始终引用 - 这是用于说明目的仅
Note that the second regex above still has a problem with a trailing " in the last field and only works at all if all field are consistently quoted - it is for illustration purposes only.
更多推荐
Excel和AWK不同意CSV总计
发布评论