我有一个大文本文件,并希望从这些文件中提取一些值。 所需值位于两个位置(所有文件中某些指定文本之前和之后)。 我想要在指定文本之后的值。 我写了以下脚本。
#!/usr/bin/env python import sys, re, os, glob path = "./" files = os.listdir(path) for finding in glob.glob('*.txt'): file = os.path.join(path, finding) text = open(file, "r") CH = [] for line in text: if re.match("(.*)(XX)(.*)", line): CH.append(line) print CH但是(正如预期的那样)脚本正在打印所有XX值。 如何编辑此脚本以获得所需的输出。 以下是大文本文件的一部分。
.................. .................. XX 1 -0.01910 XX 2 1.34832 XX 3 -2.36329 XX 4 -5.94807 XX 5 6.34862 XX 6 core Texts which I want to specify like (Normal).......... XX 1 -0.61910 XX 2 2.34832 XX 3 -0.06329 XX 4 -0.34807 XX 5 0.36862 XX 6 [coreed .................. ..................期望的输出如下,其在文本“正常”之后以XX值的降序排列。
XX 2.34832 XX 0.36862 XX -0.06329 XX -0.34807 XX -0.61910非常感谢提前。
I have a large text file and want to extract some values from these files. Required values are at two places (before and after some specified texts in all the files). I want the values which is after the specified text. I wrote following script.
#!/usr/bin/env python import sys, re, os, glob path = "./" files = os.listdir(path) for finding in glob.glob('*.txt'): file = os.path.join(path, finding) text = open(file, "r") CH = [] for line in text: if re.match("(.*)(XX)(.*)", line): CH.append(line) print CHBut this (as expected) script is printing all the XX values. How to edit this script to get the desired output. Following is the part of the large text file.
.................. .................. XX 1 -0.01910 XX 2 1.34832 XX 3 -2.36329 XX 4 -5.94807 XX 5 6.34862 XX 6 core Texts which I want to specify like (Normal).......... XX 1 -0.61910 XX 2 2.34832 XX 3 -0.06329 XX 4 -0.34807 XX 5 0.36862 XX 6 [coreed .................. ..................The desired out put is as follows which is in the decreasing order of XX values after the text 'Normal'.
XX 2.34832 XX 0.36862 XX -0.06329 XX -0.34807 XX -0.61910Thanks a lot in advance.
最满意答案
首先,我对你写的'(。 )(XX)(。 )'的正则表达式感到困惑。 我是否正确你想要所有第三个字段来自以(空白然后)XX开头的行。 或者更确切地说,在“我要指定的文本”之后的那些行?
我能想到的最简单的方法是携带一个布尔值来指示你是否找到了这个特殊的文本行“我想要指定的文本(正常)..........”。 例如...
#!/usr/bin/env python import sys, re, os, glob path = "./" files = os.listdir(path) for finding in glob.glob('*.txt'): file = os.path.join(path, finding) text = open(file, "r") CH = [] doPayAttention = False for line in text: if re.match("Texts which I want to specify", line): doPayAttention = True continue if not doPayAttention: continue mm = re.match(r"^\s*XX\s+\S+\s+(\S+)\s*$", line) if mm is not None: CH.append(mm.group(1)) CH = sorted(CH, reversed=True) for _ch in CH: print 'XX ', _ch此外,它取决于您对文件的信任程度,使用string.split()应该为您提供更具可读性的代码,而无需使用正则表达式。 最后,应该指出的是,这是一个特别简单的AWK程序。
awk '/Texts which I want to specify/,EOF {print $1 " " $3}' | sort -nFirstly, I'm confused about the regex you do have written '(.)(XX)(.)'. Am I correct you want all the 3rd field from the lines that begin with (whitespace and then) XX. Or rather those lines after "Texts which I want to specify"?
The most straightforward way I can think of is to carry around a boolean to indicate if you've found this special line of text "Texts which I want to specify like (Normal).........." yet. For example...
#!/usr/bin/env python import sys, re, os, glob path = "./" files = os.listdir(path) for finding in glob.glob('*.txt'): file = os.path.join(path, finding) text = open(file, "r") CH = [] doPayAttention = False for line in text: if re.match("Texts which I want to specify", line): doPayAttention = True continue if not doPayAttention: continue mm = re.match(r"^\s*XX\s+\S+\s+(\S+)\s*$", line) if mm is not None: CH.append(mm.group(1)) CH = sorted(CH, reversed=True) for _ch in CH: print 'XX ', _chAlso it depending on how much you trust your files, using string.split() ought to give you more readable code, without the power of regex. Finally, it should be noted that this is a particularly simple AWK program.
awk '/Texts which I want to specify/,EOF {print $1 " " $3}' | sort -n更多推荐
发布评论