文档中的字数统计频率

编程入门 行业动态 更新时间:2024-10-21 06:26:10
本文介绍了文档中的字数统计频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个目录,其中有1000个txt.files.我想知道每个单词在1000个文档中出现了多少次.因此,即使X上出现了牛"一词,也要算作一个.如果它出现在其他文档中,则将其加一.因此,如果每个文档中都出现牛",则最大值为1000.如何在不使用任何其他外部库的情况下以简便的方式执行此操作.这是我到目前为止的内容

I have a directory in which I have 1000 txt.files in it. I want to know for every word how many times it occurs in the 1000 document. So say even the word "cow" occured 100 times in X it will still be counted as one. If it occured in a different document it is incremented by one. So the maximum is 1000 if "cow" appears in every single document. How do I do this the easy way without the use of any other external library. Here's what I have so far

private Hashtable<String, Integer> getAllWordCount() private Hashtable<String, Integer> getAllWordCount() { Hashtable<String, Integer> result = new Hashtable<String, Integer>(); HashSet<String> words = new HashSet<String>(); try { for (int j = 0; j < fileDirectory.length; j++){ File theDirectory = new File(fileDirectory[j]); File[] children = theDirectory.listFiles(); for (int i = 0; i < children.length; i++){ Scanner scanner = new Scanner(new FileReader(children[i])); while (scanner.hasNext()){ String text = scanner.next().replaceAll("[^A-Za-z0-9]", ""); if (words.contains(text) == false){ if (result.get(text) == null) result.put(text, 1); else result.put(text, result.get(text) + 1); words.add(text); } } } words.clear(); } } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } System.out.println(result.size()); return result; }

推荐答案

您还需要一个HashSet<String>,用于存储从当前文件读取的每个唯一单词.

You also need a HashSet<String> in which you store each unique word you've read from the current file.

然后,在读取每个单词之后,您应该检查它是否在集合中,如果不是,则在result映射中增加相应的值(或者如果它为空,则添加一个新条目,就像您已经做过的那样)并将单词添加到集合中.

Then after every word read, you should check if it's in the set, if it isn't, increment the corresponding value in the result map (or add a new entry if it was empty, like you already do) and add the word to the set.

不过,当您开始读取新文件时,请不要忘记重置设置.

Don't forget to reset the set when you start to read a new file though.

更多推荐

文档中的字数统计频率

本文发布于:2023-11-11 07:09:51,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1577731.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:频率   字数   文档

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!