简单粗暴用两个for循环就能统计一篇英语文章的词频

编程知识 更新时间:2023-04-03 20:08:40

举个栗子:这里我准备好了一篇全是英文的aboutUN.txt

The United Nations is an international organization founded in 1945.  It is currently made up of 193 Member States.  The mission and work of the United Nations are guided by the purposes and principles contained in its founding Charter.
Each of the 193 Member States of the United Nations is a member of the General Assembly.  States are admitted to membership in the UN by a decision of the General Assembly upon the recommendation of the Security Council.
The main organs of the UN are the General Assembly, the Security Council, the Economic and Social Council, the Trusteeship Council, the International Court of Justice, and the UN Secretariat.  All were established in 1945 when the UN was founded. 
The Secretary-General of the United Nations is a symbol of the Organization's ideals and a spokesman for the interests of the world's peoples, in particular the poor and vulnerable. The current Secretary-General of the UN, and the ninth occupant of the post, is Mr. António Guterres of Portugal, who took office on 1 January 2017. The UN Charter describes the Secretary-General as "chief administrative officer" of the Organization.
The Secretariat, one of the main organs of the UN, is organized along departmental lines, with each department or office having a distinct area of action and responsibility. Offices and departments coordinate with each other to ensure cohesion as they carry out the day to day work of the Organization in offices and duty stations around the world.  At the head of the United Nations Secretariat is the Secretary-General.
The UN system, also known unofficially as the "UN family", is made up of the UN itself and many affiliated programmes, funds, and specialized agencies, all with their own membership, leadership, and budget.  The programmes and funds are financed through voluntary rather than assessed contributions. The Specialized Agencies are independent international organizations funded by both voluntary and assessed contributions.

很简单这里以空格作为分割符段落之间的间隙的消除,用split()就可ok啦,然后让for循环中的变量i跟着一遍在嵌套一个for循环变量j,接下来我们需要判断i是否等于j(当然必然会等于至少一次),然后数量number 数量加1,接着我们就写入文件即可
下面分享我的源码:

f=open('aboutUN.txt','r')
words=f.read().split()#split以空格为分割符
file=""#空字符好写入文件
writefile=open('wordsFrequence.txt','w')#写
k=0
for i in words:#双循环判断词汇
    number = 0
    k+=1
    for j in words:
        if(i==j):
            number+=1
    print(i,number)
    file=str(i)+"\t"+str(number)+"\n"
    writefile.write(file)
print('词汇量:',k)
f.close()
writefile.close()

接下来看一下我的运行结果

这个有bug,单词会重复,加个字典就可完全解决(因为键的唯一性)
改一下:

f=open('aboutUN.txt','r')
words=f.read().split()#split以空格为分割符
file=""#空字符好写入文件
pac={}#定义一个空字典
writefile=open('wordsFrequence.txt','w')#写
for i in words:
    pac[i]=pac.get(i,0)+1#返回pac[key]因为唯一性可避免单词重复
print(pac)
k=0#统计总词汇
for i,j in pac.items():#字典转为二元组
    k+=1
    file = str(i) + "\t" + str(j) + "\n"
    print(i,j)
    writefile.write(file)
print('词汇量:',k)
f.close()
writefile.close()

更多推荐

简单粗暴用两个for循环就能统计一篇英语文章的词频

本文发布于:2023-04-03 20:08:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/5b4bcbe523b50e30ccccf86d545af144.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:词频   就能   英语   粗暴   两个

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!

  • 39905文章数
  • 14阅读数
  • 0评论数