我对linux很新,觉得这应该是一个相当简单的任务,但我无法弄明白。 我有一个包含数百万行的大型数据文件,我想根据日期将文件分成较小的文件。 我有一个包含YYMMDDHH数据的时间列,我想基于DD创建子文件。 对于每个新DD,我想要创建一个包含当天所有条目的新文件。 该文件是csv,已按时间排序。
从我所看到的看起来我应该能够使用cat,awk和grep来执行我想要的。
进一步详细说明,每行有14列。 一列包含YYMMDDHH的数据(即14071000,14071000 ... 14071022,14071022 ...... 14071100 ... 14071200 ...)
我可以手动配置
cat trial | awk 'NR>=1 && NR<=100 {print}' >output.txt这给了我1到100之间的行。我想知道是否有一个允许我基于YYMMDDHH列提取的命令,以便140710上的所有数据点都可以放在一个文件中。 希望这有助于更好地解释我的问题。
I am fairly new to linux and feel this should be a fairly simple task, but I cannot quite figure it out. I have a large data file with millions of rows, and I want to break the file into smaller files based on date. I have a time column that contains YYMMDDHH data, and I want to create sub files based on the DD. For each new DD, I want a new file created with all entries for that day. The file is a csv and is already sorted by time.
From what I have read it looks like I should be able to use cat, awk and possibly grep to perform what I want.
To elaborate further, there are 14 columns per row. One column has data that contains YYMMDDHH (ie 14071000, 14071000...14071022,14071022....14071100...14071200...)
I can manually subset with
cat trial | awk 'NR>=1 && NR<=100 {print}' >output.txtThis gives me the rows between 1 and 100. I was wondering if there is a command that allows me to extract based off the YYMMDDHH column, so that all data points on 140710 could be put in a single file. Hope that helps explain my problem a little better.
最满意答案
你应该可以使用s.th. 喜欢这个:
awk '{ line_date = $1 / 100; print > "out_" line_date ".txt"; }'顺便说一句,您可能希望避免“无用地使用猫”,而不是直接在您的文件上使用awk。
You should be able to use s.th. like this:
awk '{ line_date = $1 / 100; print > "out_" line_date ".txt"; }'BTW you might want to avoid 'useless use of cat' by not piping but using awk directly on your file.
更多推荐
发布评论