如何使用open（text.txt，a）防止写入txt文件相同的单词？(How to prevent writing into txt file the same words using open(t

如何使用open（text.txt，a）防止写入txt文件相同的单词？(How to prevent writing into txt file the same words using open(text.txt,a)?)

我有一个关于附加到文本文件的问题。我编写了一个脚本，这个脚本的作用是它将以JSON格式读取URL并提取标题列表并写入文件“WordsInCategory.text”。

由于此代码将在循环中使用，因此我使用f1 = open（'WordsInCategory.text'，'a'）。

但我遇到了一个问题，那就是它会将已存在的标题添加到文件中。

我无法找到解决此问题的解决方案，使用'w'将覆盖它所写的内容。

我的代码如下：

import urllib2 import json url1 ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtype=page&cmtitle=Category:Geography&cmlimit=100' json_obj = urllib2.urlopen(url1) data1 = json.load(json_obj) f1 = open('WordsInCategory.text', 'a') for item in data1['query']: for i in data1['query']['categorymembers']: f1.write((i['title']).encode('utf8')+"\n")

请告诉我应该如何修改我的代码。

谢谢。

I have a question regarding appending to text file. I have written a script and what this script does is that it will read the URL in JSON format and extract the list of titles and write into the file "WordsInCategory.text".

As this code will be used in a loop thus I used f1 = open('WordsInCategory.text', 'a').

But I encountered a problem, that is it will add in already existing title into the file.

I am having trouble coming out with a solution to solve this problem and using 'w' will overwrite what it is written.

My code is as follows:

import urllib2 import json url1 ='https://en.wikipedia.org/w/api.php?action=query&format=json&list=categorymembers&cmtype=page&cmtitle=Category:Geography&cmlimit=100' json_obj = urllib2.urlopen(url1) data1 = json.load(json_obj) f1 = open('WordsInCategory.text', 'a') for item in data1['query']: for i in data1['query']['categorymembers']: f1.write((i['title']).encode('utf8')+"\n")

Please advice on how I should modify my code.

Thank you.

最满意答案

我建议在写入文件之前保存数组中的每个标题（因此只写入给定文件一次）。您可以这样修改代码：

import urllib2 import json data = [] f1 = open('WordsInCategory.text', 'w') url1 ='https://en.wikipedia.org/w/api.php?\ action=query&format=json&list=categorymembers\ &cmtype=page&cmtitle=Category:Geography&cmlimit=100' json_obj = urllib2.urlopen(url1) data1 = json.load(json_obj) for item in data1['query']: for i in data1['query']['categorymembers']: data.append(i['title'].encode('utf8')+"\n") # Do additional requests, and append the new titles to the data array f1.write(''.join(set(data))) f1.close()

set允许我删除任何重复的条目。

如果将标题保留在内存中是一个问题，您可以在将标题写入文件之前检查标题是否已存在，但这可能非常耗时：

import urllib2 import json data = [] url1 ='https://en.wikipedia.org/w/api.php?\ action=query&format=json&list=categorymembers\ &cmtype=page&cmtitle=Category:Geography&cmlimit=100' json_obj = urllib2.urlopen(url1) data1 = json.load(json_obj) for item in data1['query']: for i in data1['query']['categorymembers']: title = (i['title'].encode('utf8')+"\n") with open('WordsInCategory.text', 'r') as title_check: if title not in title_check: data.append(title) with open('WordsInCategory.text', 'a') as f1: f1.write(''.join(set(data))) # Handle additional requests

希望它会有所帮助。

I would suggest saving every title in an array, before writing to a file (and hence writing only once to the given file). You can modify your code this way :

import urllib2 import json data = [] f1 = open('WordsInCategory.text', 'w') url1 ='https://en.wikipedia.org/w/api.php?\ action=query&format=json&list=categorymembers\ &cmtype=page&cmtitle=Category:Geography&cmlimit=100' json_obj = urllib2.urlopen(url1) data1 = json.load(json_obj) for item in data1['query']: for i in data1['query']['categorymembers']: data.append(i['title'].encode('utf8')+"\n") # Do additional requests, and append the new titles to the data array f1.write(''.join(set(data))) f1.close()

set allows me to delete any duplicate entry.

If keeping the titles in memory is a problem, you can check if the title already exists before writing it to the file, but it may be awfully time consuming :

import urllib2 import json data = [] url1 ='https://en.wikipedia.org/w/api.php?\ action=query&format=json&list=categorymembers\ &cmtype=page&cmtitle=Category:Geography&cmlimit=100' json_obj = urllib2.urlopen(url1) data1 = json.load(json_obj) for item in data1['query']: for i in data1['query']['categorymembers']: title = (i['title'].encode('utf8')+"\n") with open('WordsInCategory.text', 'r') as title_check: if title not in title_check: data.append(title) with open('WordsInCategory.text', 'a') as f1: f1.write(''.join(set(data))) # Handle additional requests

Hope it'll be helpful.

更多推荐