在两个文件之间grep'\ '(Grep ‘\’ between two files)

编程入门 行业动态 更新时间:2024-10-14 18:21:11
两个文件之间grep'\ <'和'\>'(Grep ‘\<’ and ‘\>’ between two files)

a.txt包含单词,b.txt包含字符串。

我想知道b.txt中有多少字符串以a.txt中的单词开头或结尾。

我在GREP用户手册中找到了这个:“假设我想搜索整个单词,而不是单词的一部分?grep -w'hello'*仅搜索'hello'的整个单词的实例;它与'Othello'不匹配。要获得更多控制,请使用'\ <'和'>'来匹配单词的开头和结尾。例如:

grep 'hello\>' *

仅搜索以'hello'结尾的单词,因此它匹配单词'Othello'。“

但我不知道如何修改它来解决我的问题。

示例:a.txt

apple peach potato green big pink

b.txt

greenapple bigapple rottenapple pinkpeach xxlpotatoxxx

产量

ends.txt

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

starts.txt

1 green greenapple 1 big bigapple 1 pink pinkpeach

我在这里收到了一些想法: grep两个文件(a.txt,b.txt) - b.txt中有多少行用a.txt中的单词开始(或结束) - 输出:2个文件带有结果

但是由于a.txt包含大约50K行,而b.txt包含超过100M行,我认为grep是唯一的解决方案。

a.txt contains words, b.txt contains strings.

I would like to know how many string from b.txt starts or ends with the words from a.txt.

I have found this in GREP-s user manual: " Suppose I want to search for a whole word, not a part of a word? grep -w 'hello' * searches only for instances of ‘hello’ that are entire words; it does not match ‘Othello’. For more control, use ‘\<’ and ‘>’ to match the start and end of words. For example:

grep 'hello\>' *

searches only for words ending in ‘hello’, so it matches the word ‘Othello’."

but I dont know how to modify it to solve my problem.

Example: a.txt

apple peach potato green big pink

b.txt

greenapple bigapple rottenapple pinkpeach xxlpotatoxxx

Output

ends.txt

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

starts.txt

1 green greenapple 1 big bigapple 1 pink pinkpeach

I have received ideas here: grep two files (a.txt, b.txt) - how many lines in b.txt starts (or ends) with the words from a.txt - output: 2 files with the results

But since a.txt contains around 50K lines, and b.txt has more than 100M lines, I think, grep is the only solution.

最满意答案

你最好的办法是编写一个脚本,它将遍历包含pattern的文件的每一行,并为另一个文件中的模式grep :

以下将获取startsWith字符串:

while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}"; done < a.txt

在你的样本输入上执行它,它会产生:

1 green greenapple 1 big bigapple 1 pink pinkpeach

同样,你可以编写另一个获得endsWith字符串的单行程序

while read -r w; do end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}"; done < a.txt

会产生:

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

编辑:如果要将输出重定向到单独的文件,您可以在一个循环中执行这两个部分:

> startswith.txt     # Truncate the output files to begin with
> endswith.txt
while read -r w; do
  start=($(grep "^${w}" b.txt));
  (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}" >> startswith.txt;
  end=($(grep "${w}$" b.txt));
  (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}" >> endswith.txt;
done < a.txt

You best bet would be to write a script that would loop over every line of the file containing pattern and grep for the pattern in the other file:

The following would get the startsWith string:

while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}"; done < a.txt

Execute it over your sample input, it'd yield:

1 green greenapple 1 big bigapple 1 pink pinkpeach

Similarly, you could write another one-liner that would get the endsWith strings:

while read -r w; do end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}"; done < a.txt

which would produce:

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

EDIT: If you want to redirect the output to separate files, you could do both the parts in a single loop:

> startswith.txt     # Truncate the output files to begin with
> endswith.txt
while read -r w; do
  start=($(grep "^${w}" b.txt));
  (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}" >> startswith.txt;
  end=($(grep "${w}$" b.txt));
  (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}" >> endswith.txt;
done < a.txt

                    
                     
          

更多推荐

本文发布于:2023-07-30 17:17:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1338878.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:两个   文件   grep   Grep   files

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!