我一直在使用GNU并行一段时间,主要是为了grep大文件或者在每个命令/ arg实例很慢并且需要跨核心/主机分散时为各种参数运行相同的命令。
跨多个核心和主机也可以做的一件事就是在大型目录子树上查找文件。 例如,像这样:
find /some/path -name 'regex'
如果/some/path包含许多文件和其他包含许多文件的目录,则需要很长时间。 我不确定这是否容易加速。 例如:
ls -R -1 /some/path | parallel --profile manyhosts --pipe egrep regex
想到这样的东西,但是提出要搜索的文件会很慢。 那么加快这样一个发现的好方法是什么?
I've been using GNU parallel for a while, mostly to grep large files or run the same command for various arguments when each command/arg instance is slow and needs to be spread out across cores/hosts.
One thing which would be great to do across multiple cores and hosts as well would be to find a file on a large directory subtree. For example, something like this:
find /some/path -name 'regex'
will take a very long time if /some/path contains many files and other directories with many files. I'm not sure if this is as easy to speed up. For example:
ls -R -1 /some/path | parallel --profile manyhosts --pipe egrep regex
something like that comes to mind but ls would be very slow to come up with the files to search. What's a good way then to speed up such a find?
最满意答案
如果你有N百个直接子目录,你可以使用:
parallel --gnu -n 10 find {} -name 'regex' ::: *在每个上并行运行find ,一次十个。
但是请注意,以递归方式列出目录是一个IO绑定任务,您可以获得的加速将取决于支持介质。 在硬盘驱动器上,它可能只是更慢(如果测试,请注意磁盘缓存)。
If you have N hundred immediate subdirs, you can use:
parallel --gnu -n 10 find {} -name 'regex' ::: *to run find in parallel on each of them, ten at a time.
Note however that listing a directory recursively like this is an IO bound task, and the speedup you can get will depend on the backing medium. On a hard disk drive, it'll probably just be slower (if testing, beware disk caching).
更多推荐
发布评论