Hashmap单键持有一个类。计数键和检索计数器

编程入门行业动态更新时间:2024-10-10 21:26:37

本文介绍了Hashmap单键持有一个类。计数键和检索计数器的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在开发一个数据库自我项目。我有一个输入文件，来自： http：//ir.dcs.gla。 ac.uk/resources/test_collections/cran/

在处理成1400个独立文件后，每个文件命名为 00001.txt 。 .. 01400.txt ...），然后在它们上应用停止之后，我们将它们分别存储在特定文件夹中，可以调用 StemmedFolder ，其格式如下：

在 StemmedFolder：

调查 aerodynam wing slipstream brenckman experiment investig aerodynam wing

StemmedFolder： 00756.txt包括：

注释 eddi viscos compress mix flow lu ting

$ b b

等等....

我写了代码：

取得。

添加文档的ID

将每个文件保存到新文件00001.txt到01400.txt，如下所述

{我可以提供我的代码这4个部分，以防有人需要看看如何实现或更改或任何编辑}

$每个文件的b $ b

输出将导致单独的文件。（1400，每个名为 00001.txt ， 00002.txt ...）可以调用 FrequenceyFolder strong>使用以下格式：

在 FrequenceyFolder： 00001.txt包括：

00001，aerodynam，2 00001，agre，3 00001，angl，1 00001，attack，7 00001，basi，4 ....

< FrequenceyFolder： 00999.txt包括：

00999，aerodynam，5 00999，评估，1 00999，电梯，3 00999，比率，2 00999，结果，9 .... ：
01400，减去，1 01400，支持，1 01400，理论，1 01400，theori，1 01400，.....

strong> ：

我需要再次合并这1400个文件，输出一个txt文件，看起来像这样的格式与一些计算：
'airodynam'totalFrequency = 3docs：[[Doc_00001,5]，[Doc_01344,4]，[Doc_00123,3]] 'book'totalFrequncy = 2docs：[[Doc_00562,6]，[Doc_01111,1] .... .... 'result'totalFrequency = 1doc：[[Doc_00010,5]] .... .... 'zzzz'totalFrequency = 1doc：[[Doc_01235,1]]

感谢您花费时间阅读这篇长文章
解决方案
code> 列表的映射。

Map< String，List< FileInformation> statistics = new HashMap<>（）

在上面的映射中，键将是字，值将是 List< FileInformation> 对象描述包含单词的单个文件的统计。 FileInformation 类可以声明如下：
class FileInformation { int occurrenceCount; String fileName; // getters和setters }
填充上面的映射，请使用以下步骤：

读取 FrequencyFolder / li>
当你第一次遇到某个单词时，将其作为一个键放在 Map 中。

创建一个 FileInformation 对象，并将 occurrenceCount 设置为找到的出现次数，并将 fileName 添加到在 List< FileInformation> 中创建的文件对应的第2步。

下次在另一个文件中遇到同一个词时，创建一个新的 FileInfomation 对象，列表< FileInformation> 对应于地图中该字词的条目。

一旦您已经填充 Map ，打印统计信息应该是一块蛋糕。
for（String word：statistics.keySet（））{ List< FileInformation> fileInfos = statistics.get（word）; for（FileInformation fileInfo：fileInfos）{ //总结单词的occureneceCount以获得总频率} }

I am working on a database self project. I have an input file got from: ir.dcs.gla.ac.uk/resources/test_collections/cran/

After processing into 1400 separate file, each named 00001.txt,... 01400.txt...) and after applying Stemming on them, I will store them separately in a specific folder lets call it StemmedFolder with the following format:

in StemmedFolder: 00001.txt includes:
investig aerodynam wing slipstream brenckman experiment investig aerodynam wing
in StemmedFolder: 00756.txt includes:
remark eddi viscos compress mix flow lu ting
And so on....

I wrote the codes that do:

get the StemmedFolder, Count the Unique words

Sort Alphabetically

Add the ID of the document

save each to a new file 00001.txt to 01400.txt as will be described

{I can provide my codes for these 4 sections in case somebody needs to see how is the implementation or change or any edit}

output of each file will be result to a separate file. (1400, each named 00001.txt, 00002.txt...) in a specific folder lets call it FrequenceyFolder with the following format:

in FrequenceyFolder: 00001.txt includes:
00001,aerodynam,2 00001,agre,3 00001,angl,1 00001,attack,7 00001,basi,4 ....
in FrequenceyFolder: 00999.txt includes:
00999,aerodynam,5 00999,evalu,1 00999,lift,3 00999,ratio,2 00999,result,9 ....
in FrequenceyFolder: 01400.txt includes:
01400,subtract,1 01400,support,1 01400,theoret,1 01400,theori,1 01400,.....

______________

Now my question:

I need to combine these 1400 files again to output a txt file that looks like this format with some calculation:
'aerodynam' totalFrequency=3docs: [[Doc_00001,5],[Doc_01344,4],[Doc_00123,3]] 'book' totalFrequncy=2docs: [[Doc_00562,6],[Doc_01111,1] .... .... 'result' totalFrequency=1doc: [[Doc_00010,5]] .... .... 'zzzz' totalFrequency=1doc: [[Doc_01235,1]]

Thanks for spending time reading this long post
解决方案
You can use a Map of List.

Map<String,List<FileInformation>> statistics = new HashMap<>()

In the above map, the key will be the word and the value will be a List<FileInformation> object describing the statistics of individual files containing the word. The FileInformation class can be declared as follows :
class FileInformation { int occurrenceCount; String fileName; //getters and setters }
To populate the above Map, use the following steps :

Read each file in the FrequencyFolder

When you come across a word for the first time, put it as a key in the Map.

Create a FileInformation object and set the occurrenceCount to the number of occurrences found and set the fileName to the name of the file it was found in. Add this object in the List<FileInformation> corresponding to the key created in step 2.

The next time you come across the same word in another file, create a new FileInfomation object and add it to the List<FileInformation> corresponding to the entry in the map for the word.

Once you have the Map populated, printing the statistics should be a piece of cake.
for(String word : statistics.keySet()) { List<FileInformation> fileInfos = statistics.get(word); for(FileInformation fileInfo : fileInfos) { //sum up the occureneceCount for the word to get the total frequency } }

更多推荐

Hashmap单键持有一个类。计数键和检索计数器

本文发布于:2023-10-18 15:36:34，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1504650.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

单键计数器 Hashmap

上一篇：如何使一个PHP的文章计数器?

下一篇：如何在Java程序中创建一个计数器

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word