使用bash计数文件中每个单词出现的次数

编程入门 行业动态 更新时间:2024-10-17 13:32:30
本文介绍了使用bash计数文件中每个单词出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我想计算文件中每个单词的出现但是结果是错误的.

i want to count every word's occurrence in a file but the result is wrong.

#!/bin/bash #usage: count.sh file declare -a dict for word in $(cat $1) do if [ ${dict[$word]} == "" ] ;then dict[$word]=0 else dict[$word]=$[${dict[$word]} + 1] fi done for word in ${!dict[@]} do echo $word: ${dict[$word]} done

使用下面的测试文件:

learning the bash shell this is second line this is the last line

bash -x count.sh文件得到结果:

bash -x count.sh file get the result:

+ declare -a dict ++ cat book + for word in '$(cat $1)' + '[' '' == '' ']' + dict[$word]=0 + for word in '$(cat $1)' + '[' 0 == '' ']' + dict[$word]=1 + for word in '$(cat $1)' + '[' 1 == '' ']' + dict[$word]=2 + for word in '$(cat $1)' + '[' 2 == '' ']' + dict[$word]=3 + for word in '$(cat $1)' + '[' 3 == '' ']' + dict[$word]=4 + for word in '$(cat $1)' + '[' 4 == '' ']' + dict[$word]=5 + for word in '$(cat $1)' + '[' 5 == '' ']' + dict[$word]=6 + for word in '$(cat $1)' + '[' 6 == '' ']' + dict[$word]=7 + for word in '$(cat $1)' + '[' 7 == '' ']' + dict[$word]=8 + for word in '$(cat $1)' + '[' 8 == '' ']' + dict[$word]=9 + for word in '$(cat $1)' + '[' 9 == '' ']' + dict[$word]=10 + for word in '$(cat $1)' + '[' 10 == '' ']' + dict[$word]=11 + for word in '$(cat $1)' + '[' 11 == '' ']' + dict[$word]=12 + for word in '${!dict[@]}' + echo 0: 12 0: 12

推荐答案

使用 declare -a dict 意味着每个键都被评估为数字值,然后将其用作索引.如果您要用文字存储东西,那不是您想要的.改用 declare -A .

Using declare -a dict means that each key is being evaluated to a numeric value, which is then used as an index. That's not what you want, if you're storing things by words. Use declare -A instead.

此外, $ [] 是一种过时的数学语法.甚至现代的POSIX sh都支持 $((())",您应该改用它:

Also, $[ ] is an exceedingly outdated syntax for math. Even modern POSIX sh supports $(( )), which you should use instead:

dict[$word]=$(( ${dict[$word]} + 1 ))

或者,要利用仅bash的数学语法:

or, to take advantage of bash-only math syntax:

(( dict[$word]++ ))

另外,在$(cat $ 1)中的单词中使用表示单词有几种破损方式:

  • 它不引用 $ 1 ,因此对于带有空格的文件名,它将名称拆分成几个单词,并尝试将每个单词作为一个单独的文件打开.要解决此问题,您可以使用 $(cat"$ 1")或 $(<"$ 1")(效率更高,因为它不需要启动外部程序cat).
  • 它会尝试将文件中的单词扩展为glob-如果文件包含 * ,则当前目录中的每个文件都将被视为一个单词.
  • It doesn't quote $1, so for a filename with spaces, it will split the name into several words and try to open each word as a separate file. To fix only this, you would use $(cat "$1") or $(<"$1") (which is more efficient, as it doesn't require starting the external program cat).
  • It tries to expand the words in the file as globs -- if the file contains *, every file in the current directory will be treated as a word.

相反,使用while循环:

Instead, use a while loop:

while read -r -d' ' word; do if [[ -n ${dict[$word]} ]] ; then dict[$word]=$(( ${dict[$word]} + 1 )) else dict[$word]=1 fi done <"$1"

更多推荐

使用bash计数文件中每个单词出现的次数

本文发布于:2023-11-30 17:31:45,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1650905.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:单词   次数   文件   bash

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!