按唯一列值对CSV进行子集(Subsetting a CSV by unique column values)

编程入门 行业动态 更新时间:2024-10-28 12:27:22
按唯一列值对CSV进行子集(Subsetting a CSV by unique column values)

我对linux很新,觉得这应该是一个相当简单的任务,但我无法弄明白。 我有一个包含数百万行的大型数据文件,我想根据日期将文件分成较小的文件。 我有一个包含YYMMDDHH数据的时间列,我想基于DD创建子文件。 对于每个新DD,我想要创建一个包含当天所有条目的新文件。 该文件是csv,已按时间排序。

从我所看到的看起来我应该能够使用cat,awk和grep来执行我想要的。

进一步详细说明,每行有14列。 一列包含YYMMDDHH的数据(即14071000,14071000 ... 14071022,14071022 ...... 14071100 ... 14071200 ...)

我可以手动配置

cat trial | awk 'NR>=1 && NR<=100 {print}' >output.txt

这给了我1到100之间的行。我想知道是否有一个允许我基于YYMMDDHH列提取的命令,以便140710上的所有数据点都可以放在一个文件中。 希望这有助于更好地解释我的问题。

I am fairly new to linux and feel this should be a fairly simple task, but I cannot quite figure it out. I have a large data file with millions of rows, and I want to break the file into smaller files based on date. I have a time column that contains YYMMDDHH data, and I want to create sub files based on the DD. For each new DD, I want a new file created with all entries for that day. The file is a csv and is already sorted by time.

From what I have read it looks like I should be able to use cat, awk and possibly grep to perform what I want.

To elaborate further, there are 14 columns per row. One column has data that contains YYMMDDHH (ie 14071000, 14071000...14071022,14071022....14071100...14071200...)

I can manually subset with

cat trial | awk 'NR>=1 && NR<=100 {print}' >output.txt

This gives me the rows between 1 and 100. I was wondering if there is a command that allows me to extract based off the YYMMDDHH column, so that all data points on 140710 could be put in a single file. Hope that helps explain my problem a little better.

最满意答案

你应该可以使用s.th. 喜欢这个:

awk '{ line_date = $1 / 100; print > "out_" line_date ".txt"; }'

顺便说一句,您可能希望避免“无用地使用猫”,而不是直接在您的文件上使用awk。

You should be able to use s.th. like this:

awk '{ line_date = $1 / 100; print > "out_" line_date ".txt"; }'

BTW you might want to avoid 'useless use of cat' by not piping but using awk directly on your file.

更多推荐

本文发布于:2023-07-26 08:08:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1272879.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:子集   Subsetting   CSV   values   column

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!