在UNIX中计算文本的长度(Calculating the length of a text in UNIX)

编程入门行业动态更新时间:2024-10-27 10:19:03

我有两个问题：

1）我想从我的脚本中删除每个非英文字母2）我想计算文本的长度，从标点符号，空格等中清除。我只是不知道这部分有什么问题

Linux脚本：

#!/usr/bin/bash awk ' BEGIN { FS="" } # defining a field separator in order to treat each character one by one { $0 = tolower($0) # removing case distinctions gsub(/[[:punct:]]/,"", $0) # removing every punctuation mark gsub(/\ /, "", $0) # removing spaces gsub(/[0-9]/, "", $0) # removing digits gsub(/![a-z]/, "", $0) # removing every non-English letter <- This does not work #After the removing of every possible punctuation mark, space, digit and non-English #letter in the user-defined text, we calculate the occurence of each character and place into an array for (i = 1; i <= NF; i++) { freq[$i]++ length++ } }

但它向我显示以下错误：awk：cmd。第17行：长度++ awk：cmd。第17行：^意外的换行符或字符串结尾

请至少帮我解决第二个问题。我只是没有错，一切似乎都没问题。先谢谢！

I have two questions:

1) I want to remove from my script every non-English letter 2) I want to calculate the length of a text, cleared from punctuation, spaces, etc. And I just do not know what is wrong with this part

Linux Script:

But it shows me the following error: awk: cmd. line 17: length++ awk: cmd. line 17: ^ unexpected newline or end of string

Please help me with at least the second question. I just do not what is wrong, everything seems alright. Thanks beforehand !

最满意答案

使用awk

awk '{gsub("[^A-Za-z]", "");i+=length}END{print i}'

使用tr和wc

tr -C -d "A-Za-z" | wc -c

它们都删除不在A-Za-z范围内的所有字符，然后计算剩余的字符。 tr的优点还是缺点是依赖于你的语言环境。

您也可以像创建shell脚本一样创建awk脚本。

#!/usr/bin/awk { gsub("[^A-Za-z]", ""); i+=length } END { print i }

为了获得最大的可移植性，您需要将脚本中的语言环境设置为POSIX ，或列出每个字符。

tr -C -d "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | wc -c

Using awk

awk '{gsub("[^A-Za-z]", "");i+=length}END{print i}'

Using tr and wc

tr -C -d "A-Za-z" | wc -c

They both delete all characters not in the range A-Za-z, then count the remaining characters. The tr one has the advantage, or disadvantage, of being dependent on your locale.

You can also create an awk script the same way you create a shell script.

#!/usr/bin/awk { gsub("[^A-Za-z]", ""); i+=length } END { print i }

For maximum portability you either need to set the locale in your script to POSIX, or list out every character.

tr -C -d "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" | wc -c

更多推荐

本文发布于:2023-07-07 15:28:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1065197.html