在两个文件之间grep'\ '(Grep ‘\’ between two files)

编程入门行业动态更新时间:2024-10-14 18:21:11

在两个文件之间grep'\ <'和'\>'(Grep ‘\<’ and ‘\>’ between two files)

a.txt包含单词，b.txt包含字符串。

我想知道b.txt中有多少字符串以a.txt中的单词开头或结尾。

我在GREP用户手册中找到了这个：“假设我想搜索整个单词，而不是单词的一部分？grep -w'hello'*仅搜索'hello'的整个单词的实例;它与'Othello'不匹配。要获得更多控制，请使用'\ <'和'>'来匹配单词的开头和结尾。例如：

grep 'hello\>' *

仅搜索以'hello'结尾的单词，因此它匹配单词'Othello'。“

但我不知道如何修改它来解决我的问题。

示例：a.txt

apple peach potato green big pink

b.txt

greenapple bigapple rottenapple pinkpeach xxlpotatoxxx

产量

ends.txt

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

starts.txt

1 green greenapple 1 big bigapple 1 pink pinkpeach

我在这里收到了一些想法： grep两个文件（a.txt，b.txt） - b.txt中有多少行用a.txt中的单词开始（或结束） - 输出：2个文件带有结果

但是由于a.txt包含大约50K行，而b.txt包含超过100M行，我认为grep是唯一的解决方案。

a.txt contains words, b.txt contains strings.

I would like to know how many string from b.txt starts or ends with the words from a.txt.

I have found this in GREP-s user manual: " Suppose I want to search for a whole word, not a part of a word? grep -w 'hello' * searches only for instances of ‘hello’ that are entire words; it does not match ‘Othello’. For more control, use ‘\<’ and ‘>’ to match the start and end of words. For example:

grep 'hello\>' *

searches only for words ending in ‘hello’, so it matches the word ‘Othello’."

but I dont know how to modify it to solve my problem.

Example: a.txt

apple peach potato green big pink

b.txt

greenapple bigapple rottenapple pinkpeach xxlpotatoxxx

Output

ends.txt

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

starts.txt

1 green greenapple 1 big bigapple 1 pink pinkpeach

I have received ideas here: grep two files (a.txt, b.txt) - how many lines in b.txt starts (or ends) with the words from a.txt - output: 2 files with the results

But since a.txt contains around 50K lines, and b.txt has more than 100M lines, I think, grep is the only solution.

最满意答案

你最好的办法是编写一个脚本，它将遍历包含pattern的文件的每一行，并为另一个文件中的模式grep ：

以下将获取startsWith字符串：

while read -r w; do start=($(grep "^${w}" b.txt)); (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}"; done < a.txt

在你的样本输入上执行它，它会产生：

1 green greenapple 1 big bigapple 1 pink pinkpeach

同样，你可以编写另一个获得endsWith字符串的单行程序 ：

while read -r w; do end=($(grep "${w}$" b.txt)); (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}"; done < a.txt

会产生：

3 apple greenapple bigapple rottenapple 1 peach pinkpeach

编辑：如果要将输出重定向到单独的文件，您可以在一个循环中执行这两个部分：

> startswith.txt     # Truncate the output files to begin with
> endswith.txt
while read -r w; do
  start=($(grep "^${w}" b.txt));
  (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}" >> startswith.txt;
  end=($(grep "${w}$" b.txt));
  (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}" >> endswith.txt;
done < a.txt
You best bet would be to write a script that would loop over every line of the file containing pattern and grep for the pattern in the other file: 
The following would get the startsWith string: 
while read -r w; do
  start=($(grep "^${w}" b.txt));
  (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}";
done < a.txt
 
Execute it over your sample input, it'd yield: 
1 green greenapple
1 big bigapple
1 pink pinkpeach
 
Similarly, you could write another one-liner that would get the endsWith strings: 
while read -r w; do
  end=($(grep "${w}$" b.txt));
  (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}";
done < a.txt
 
which would produce: 
3 apple greenapple bigapple rottenapple
1 peach pinkpeach
 
 
EDIT: If you want to redirect the output to separate files, you could do both the parts in a single loop: 
> startswith.txt     # Truncate the output files to begin with
> endswith.txt
while read -r w; do
  start=($(grep "^${w}" b.txt));
  (( ${#start[@]} != 0 )) && echo "${#start[@]} $w ${start[@]}" >> startswith.txt;
  end=($(grep "${w}$" b.txt));
  (( ${#end[@]} != 0 )) && echo "${#end[@]} $w ${end[@]}" >> endswith.txt;
done < a.txt

更多推荐

本文发布于:2023-07-30 17:17:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1338878.html