使用GNU并行加速巨大目录树上的find命令的最佳方法是什么?(What is the best way to speed up a find command on a huge directory

编程入门 行业动态 更新时间:2024-10-27 04:26:18
使用GNU并行加速巨大目录树上的find命令的最佳方法是什么?(What is the best way to speed up a find command on a huge directory tree using GNU parallel?)

我一直在使用GNU并行一段时间,主要是为了grep大文件或者在每个命令/ arg实例很慢并且需要跨核心/主机分散时为各种参数运行相同的命令。

跨多个核心和主机也可以做的一件事就是在大型目录子树上查找文件。 例如,像这样:

find /some/path -name 'regex'

如果/some/path包含许多文件和其他包含许多文件的目录,则需要很长时间。 我不确定这是否容易加速。 例如:

ls -R -1 /some/path | parallel --profile manyhosts --pipe egrep regex

想到这样的东西,但是提出要搜索的文件会很慢。 那么加快这样一个发现的好方法是什么?

I've been using GNU parallel for a while, mostly to grep large files or run the same command for various arguments when each command/arg instance is slow and needs to be spread out across cores/hosts.

One thing which would be great to do across multiple cores and hosts as well would be to find a file on a large directory subtree. For example, something like this:

find /some/path -name 'regex'

will take a very long time if /some/path contains many files and other directories with many files. I'm not sure if this is as easy to speed up. For example:

ls -R -1 /some/path | parallel --profile manyhosts --pipe egrep regex

something like that comes to mind but ls would be very slow to come up with the files to search. What's a good way then to speed up such a find?

最满意答案

如果你有N百个直接子目录,你可以使用:

parallel --gnu -n 10 find {} -name 'regex' ::: *

在每个上并行运行find ,一次十个。

但是请注意,以递归方式列出目录是一个IO绑定任务,您可以获得的加速将取决于支持介质。 在硬盘驱动器上,它可能只是更慢(如果测试,请注意磁盘缓存)。

If you have N hundred immediate subdirs, you can use:

parallel --gnu -n 10 find {} -name 'regex' ::: *

to run find in parallel on each of them, ten at a time.

Note however that listing a directory recursively like this is an IO bound task, and the speedup you can get will depend on the backing medium. On a hard disk drive, it'll probably just be slower (if testing, beware disk caching).

更多推荐

本文发布于:2023-07-23 01:18:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1225572.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:树上   命令   目录   方法   find

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!