a.txt包含单词,b.txt包含字符串。
我想知道b.txt中有多少字符串以a.txt中的单词开头或结尾。
我在GREP用户手册中找到了这个:“假设我想搜索整个单词,而不是单词的一部分?grep -w'hello'*仅搜索'hello'的整个单词的实例;它与'Othello'不匹配。要获得更多控制,请使用'\ <'和'>'来匹配单词的开头和结尾。例如:
grep 'hello\>' *仅搜索以'hello'结尾的单词,因此它匹配单词'Othello'。“
但我不知道如何修改它来解决我的问题。
示例:a.txt
apple peach potato green big pinkb.txt
greenapple bigapple rottenapple pinkpeach xxlpotatoxxx产量
ends.txt
3 apple greenapple bigapple rottenapple 1 peach pinkpeachstarts.txt
1 green greenapple 1 big bigapple 1 pink pinkpeach我在这里收到了一些想法: grep两个文件(a.txt,b.txt) - b.txt中有多少行用a.txt中的单词开始(或结束) - 输出:2个文件带有结果
但是由于a.txt包含大约50K行,而b.txt包含超过100M行,我认为grep是唯一的解决方案。
a.txt contains words, b.txt contains strings.
I would like to know how many string from b.txt starts or ends with the words from a.txt.
I have found this in GREP-s user manual: " Suppose I want to search for a whole word, not a part of a word? grep -w 'hello' * searches only for instances of ‘hello’ that are entire words; it does not match ‘Othello’. For more control, use ‘\<’ and ‘>’ to match the start and end of words. For example:
grep 'hello\>' *searches only for words ending in ‘hello’, so it matches the word ‘Othello’."
but I dont know how to modify it to solve my problem.
Example: a.txt
apple peach potato green big pinkb.txt
greenapple bigapple rottenapple pinkpeach xxlpotatoxxxOutput
ends.txt
3 apple greenapple bigapple rottenapple 1 peach pinkpeachstarts.txt
1 green greenapple 1 big bigapple 1 pink pinkpeachI have received ideas here: grep two files (a.txt, b.txt) - how many lines in b.txt starts (or ends) with the words from a.txt - output: 2 files with the results
But since a.txt contains around 50K lines, and b.txt has more than 100M lines, I think, grep is the only solution.
最满意答案
你最好的办法是编写一个脚本,它将遍历包含pattern的文件的每一行,并为另一个文件中的模式grep :
以下将获取startsWith字符串:
while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}"; done < a.txt在你的样本输入上执行它,它会产生:
1 green greenapple 1 big bigapple 1 pink pinkpeach同样,你可以编写另一个获得endsWith字符串的单行程序 :
while read -r w; do end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}"; done < a.txt会产生:
3 apple greenapple bigapple rottenapple 1 peach pinkpeach编辑:如果要将输出重定向到单独的文件,您可以在一个循环中执行这两个部分:
> startswith.txt # Truncate the output files to begin with > endswith.txt while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}" >> startswith.txt; end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}" >> endswith.txt; done < a.txtYou best bet would be to write a script that would loop over every line of the file containing pattern and grep for the pattern in the other file:
The following would get the startsWith string:
while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}"; done < a.txtExecute it over your sample input, it'd yield:
1 green greenapple 1 big bigapple 1 pink pinkpeachSimilarly, you could write another one-liner that would get the endsWith strings:
while read -r w; do end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}"; done < a.txtwhich would produce:
3 apple greenapple bigapple rottenapple 1 peach pinkpeach
EDIT: If you want to redirect the output to separate files, you could do both the parts in a single loop:
> startswith.txt # Truncate the output files to begin with > endswith.txt while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}" >> startswith.txt; end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}" >> endswith.txt; done < a.txt
更多推荐
发布评论