我有以下格式的制表符分隔列的文本文件:
fileName Type sc1 sc2 sc3 sc4 sc5 sc6 file1 abc 0 0.2 0 0 0 0 file1 xyz 0 0.8 0 0 0.8 0.2 file2 abc 0.5 0 0 0.1 0 0 file2 xyz 0 0 0 0.7 0.003 0.1 file3 abc 0.002 0 0 0 0.04 0 file3 xyz 0.5 0 0 0 0 0.3 . .第一行是标题行。 sc1,sc2,sc3等得分为1,得分为2,得分为3(它们不是全为零)
有两种以上的类型,每个文件具有相同数量的类型。
如何知道xyz类型sc6最低的fileName? 或者如何从这个文件创建另一个文本文件,其中包含所有xyz类型的文件名和sc6?
我真的不想把它作为数据库加载或做类似的事情。 我想知道我是否可以使用Unix的cut , sort或grep命令快速完成这项工作。 任何perl,awk解决方案也可以接受。
如果问题不是很明确,请告诉我。
PS请随意为此问题提出不同的标题。 这是我能想到的最好的。
I have text file which has tab-delimited columns in following format:
fileName Type sc1 sc2 sc3 sc4 sc5 sc6 file1 abc 0 0.2 0 0 0 0 file1 xyz 0 0.8 0 0 0.8 0.2 file2 abc 0.5 0 0 0.1 0 0 file2 xyz 0 0 0 0.7 0.003 0.1 file3 abc 0.002 0 0 0 0.04 0 file3 xyz 0.5 0 0 0 0 0.3 . .First row is the header row. sc1, sc2, sc3 etc are score 1, score 2, score 3 (they are not all zeros)
There are more than two types and each file has same number of types.
How to know the fileName which has lowest sc6 for xyz type? or how to create another text file from this file, which will have filename and sc6 for all xyz type?
I really don't want to load this as a db or do something like that. I was wondering if I can accomplish this rather quickly using Unix's cut , sort or grep commands. Any perl, awk solution acceptable too.
Let me know if the question is not very clear.
P.S. Please feel free to suggest different heading for this question. This is the best I could come up with.
最满意答案
awk -v lowest=9999999 '$2 == "xyz" && $8 < lowest { lowest = $8; lowfile = $1 } END {print lowfile, "\t", lowest}' infile要么:
awk '$2 == "xyz"' infile | sort -k 8n | head -1 | cut -f1,8要为所有xyz创建一个只包含文件名和sc6的文件:
awk '$2 = "xyz" {print $1, "\t", $8}' infile > outfile awk -v lowest=9999999 '$2 == "xyz" && $8 < lowest { lowest = $8; lowfile = $1 } END {print lowfile, "\t", lowest}' infileor:
awk '$2 == "xyz"' infile | sort -k 8n | head -1 | cut -f1,8To create a file with just filename and sc6 for all xyz:
awk '$2 = "xyz" {print $1, "\t", $8}' infile > outfile更多推荐
发布评论