MIPS：计算纯文本文件中的唯一单词并查找其频率(MIPS: counting unique words in a plain text file and finding their Frequenc

MIPS：计算纯文本文件中的唯一单词并查找其频率(MIPS: counting unique words in a plain text file and finding their Frequency)

第一次在这里问问题。我对MIPS的家庭作业非常感到沮丧。赋值状态：输入将是纯文本文件，输出将是包含带有频率的单词列表的另一个文件。输出文件呈现两列，左列将是一个单词，右列将是输入文件中的频率数。例如，输出文件可能如下所示：

有：2 他们：3 是：4 我：5

我们假设使用MIPS汇编对此进行编码。我不明白我该怎么处理这类问题。我想首先将旧文件中的所有字符读入内存中的数组，并尝试找出构建第二个数组的方法，其中包含所有唯一字及其频率。

到目前为止，我只能将原始文件读入数组。 .data chars：.space 1024 fin：.ascii“chill.txt”#filename from uniqueWord：.space 1024

.text主要：

Open a file li $v0, 13 # syscall for open file la $a0, fin # output file name li $a1, 0 # open for read li $a2, 0 syscall move $s6, $v0 # save the file descriptor read from the file that just opened li $v0, 14 # syscall for read from file move $a0, $s6 # file descriptoer la $a1, chars li $a2, 1024 syscall I try to use these to find the beginning and the ending or a word. add $t4, $zero, $zero # I = 0 add $t0, $zero, $zero # TOTAL = 0 add $t1, $zero, 44 # ENDPOINT = ',' add $t2, $zero, 32 # ENDPOINT = ' ' addi $t3, $zero, 46 # ENDPOINT = '.' loop: lb $t5, chars($t4) # for c in chars beq $t5, $zero, endloop # beq $t5, $t3,uniqueWord # if c == '.' go to uniqueWord beq $t5, $t1,uniqueWord # if c == ',' go to endloop beq $t5, $t2, uniqueWord # if c == ' ' go to endloop addi $t4, $t4, 1 # i += 1 increment index addi $t0, $t0, 1 # total += 1 j loop

如果有人能指导我完成这项任务，我将非常感激。万分感谢。

first time ask question here. I am very frustrated by a homework assignment for MIPS. The assignment states: The input will be a plain text file, the output will be another file containing a list of words with their frequencies. The output file present two columns, the left column will be a word, the right column will be the number of frequency in the input file. For example, the output file could be look like this:

have: 2 they: 3 is: 4 i: 5

We are suppose to code this using MIPS Assembly. I do not understand how should I approach this kind of problem. I was thinking first read all the characters in the old file into a array in memory, and try to figure out a way to construct a second array contain all unique words and their frequencies.

So far, I can only be able to read the original file into an array. .data chars: .space 1024 fin: .ascii "chill.txt" # file name to read from uniqueWord: .space 1024

.text main:

Open a file li $v0, 13 # syscall for open file la $a0, fin # output file name li $a1, 0 # open for read li $a2, 0 syscall move $s6, $v0 # save the file descriptor read from the file that just opened li $v0, 14 # syscall for read from file move $a0, $s6 # file descriptoer la $a1, chars li $a2, 1024 syscall I try to use these to find the beginning and the ending or a word. add $t4, $zero, $zero # I = 0 add $t0, $zero, $zero # TOTAL = 0 add $t1, $zero, 44 # ENDPOINT = ',' add $t2, $zero, 32 # ENDPOINT = ' ' addi $t3, $zero, 46 # ENDPOINT = '.' loop: lb $t5, chars($t4) # for c in chars beq $t5, $zero, endloop # beq $t5, $t3,uniqueWord # if c == '.' go to uniqueWord beq $t5, $t1,uniqueWord # if c == ',' go to endloop beq $t5, $t2, uniqueWord # if c == ' ' go to endloop addi $t4, $t4, 1 # i += 1 increment index addi $t0, $t0, 1 # total += 1 j loop

I would really appreciate if anyone could give me direction on this assignment. Thanks a million times.

最满意答案

对于像这样的所有复杂任务，您应该首先在C中编写和调试一个简单的解决方案，然后手动转换为汇编。您可以使用C语言快速开发工作解决方案。然后，您需要做的就是验证是否已将其正确地转换为汇编并调试可能已在程序集中引入的任何错误。

考虑问题的解决方案并同时考虑在汇编程序中实现该解决方案太复杂了。你会陷入困境，迷失方向。这就是我在日常工作中用汇编写东西的方法。

解决这个问题的最简单方法是使一个数组包含到目前为止看到的所有单词，以及一个单独的数组，每个单词使用use-count，这样WORDS [0]的use-count保存在COUNTS中[0]。然后：

当你读一个新单词时：将它与数组中的每个单词进行比较如果它匹配一个单词，则递增该单词的use-count 如果你没有匹配到达数组的末尾：将当前单词添加到数组的末尾增加数组长度。如果数组长度超过最初为数组分配的空间，请放弃并退出并显示错误

这只是一个简单的建议，不会太难转换成汇编程序。对于这个问题，显然有许多更快的算法，例如树和哈希表。

With all complex assignments like this you should first write and debug a simple solution in C, then do a hand translation into assembly. You can develop a working solution quickly in C. Then all you need to do is verify that you have correctly translated it into assembly and debug any bugs you may have introduced in the assembly.

It is too complicated to think about the solution to a problem and think about the implementation of that solution in assembler at the same time. You will just get bogged down in details and get lost. This is how I write stuff in assembly in my day job.

The simplest way to solve this would be to have an array containing all the words that have been seen so far, and a separate array with use-count for each word, so that the use-count for WORDS[0] is kept in COUNTS[0]. Then:

when you read in a new word: compare it to each of the words in the array if it matches a word then increment the use-count for that word if you reach the end of the array without a match: add the current word to the end of the array increment the array length. If the array length exceeds the space you initially allocated for the array, give up and exit with an error

This is just a simple suggestion that won't be too hard to translate into assembler. There are obviously many faster algorithms for this problem, such as trees and hash tables.

更多推荐