我想计算文件中的特定单词。
例如,“苹果”在文件中出现多少次。 我试过这个:
#!/usr/bin/env python import re logfile = open("log_file", "r") wordcount={} for word in logfile.read().split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 for k,v in wordcount.items(): print k, v用'apple'代替'word',但它仍然记录我文件中所有可能的单词。
任何建议将不胜感激。 :)
I want to count a specific word in the file.
For example how many times does 'apple' appear in the file. I tried this:
#!/usr/bin/env python import re logfile = open("log_file", "r") wordcount={} for word in logfile.read().split(): if word not in wordcount: wordcount[word] = 1 else: wordcount[word] += 1 for k,v in wordcount.items(): print k, vby replacing 'word' with 'apple', but it still counts all possible words in my file.
Any advice would be greatly appreciated. :)
最满意答案
你可以使用str.count()因为你只关心一个单词的出现:
with open("log_file") as f: contents = f.read() count = contents.count("apple")但是,为了避免出现一些错误的情况,例如错误地计算诸如"applejack"类的词,我建议您使用正则表达式 :
import re with open("log_file") as f: contents = f.read() count = sum(1 for match in re.finditer(r"\bapple\b", contents))\b在正则表达式中确保模式在单词边界上开始和结束(与长字符串中的子字符串相反)。
You could just use str.count() since you only care about occurrences of a single word:
with open("log_file") as f: contents = f.read() count = contents.count("apple")However, to avoid some corner cases, such as erroneously counting words like "applejack", I suggest that you use a regex:
import re with open("log_file") as f: contents = f.read() count = sum(1 for match in re.finditer(r"\bapple\b", contents))\b in the regex ensures that the pattern begins and ends on a word boundary (as opposed to a substring within a longer string).
更多推荐
发布评论