这是C ++和Python之间CPU时间比较的有效方法吗？(Is this a valid method for CPU time comparison between C++ and Python?)

我有兴趣比较一些编写C ++与Python（在Linux上运行）的代码部分的CPU时间。以下方法是否会产生两者之间的“公平”比较？

蟒蛇

使用资源模块：

import resource def cpu_time(): return resource.getrusage(resource.RUSAGE_SELF)[0]+\ # time in user mode resource.getrusage(resource.RUSAGE_SELF)[1] # time in system mode

它允许这样的时间：

def timefunc( func ): start=cpu_time() func() return (cpu_time()-start)

然后我测试如下：

def f(): for i in range(int(1e6)): pass avg = 0 for k in range(10): avg += timefunc( f ) / 10.0 print avg => 0.002199700000000071

C ++

使用ctime lib：

#include <ctime> #include <iostream> int main() { double avg = 0.0; int N = (int) 1e6; for (int k=0; k<10; k++) { clock_t start; start = clock(); for (int i=0; i<N; i++) continue; avg += (double)(clock()-start) / 10.0 / CLOCKS_PER_SEC; } std::cout << avg << '\n'; return 0; }

产生0.002 。

关注：

我已经读过C ++ clock()测量CPU时间，这是我所追求的，但我似乎无法找到它是否包括用户和系统时间。 C ++的结果不太精确。这是为什么？如上所述，总体比较公平。

更新

根据David在评论中的建议更新了c ++代码：

#include <sys/resource.h> #include <iostream> int main() { double avg = 0.0; int N = (int) 1e6; int tally = 0; struct rusage usage; struct timeval ustart, ustop, sstart, sstop; getrusage(RUSAGE_SELF, &usage); ustart = usage.ru_utime; sstart = usage.ru_stime; for (int k=0; k<10; k++) { ustart = usage.ru_utime; sstart = usage.ru_stime; for (int i=0; i<N; i++) continue; getrusage(RUSAGE_SELF, &usage); ustop = usage.ru_utime; sstop = usage.ru_stime; avg += ( (ustop.tv_sec+ustop.tv_usec/1e6+ sstop.tv_sec+sstop.tv_usec/1e6) - (ustart.tv_sec+ustart.tv_usec/1e6+ sstart.tv_sec+sstart.tv_usec/1e6) ) / 10.0; } std::cout << avg << '\n'; return 0; }

运行：

g++ -O0 cpptimes.cpp ; ./a.out => 0.0020996 g++ -O1 cpptimes.cpp ; ./a.out => 0

因此我认为getrusage让我的分辨率更高一些，但我不确定我应该读多少。设置优化标志肯定会产生很大的不同。

I'm interested in comparing CPU times some code portions written C++ vs Python (running on Linux). Will the following methods produce a "fair" comparison between the two?

Python

Using the resource module:

import resource def cpu_time(): return resource.getrusage(resource.RUSAGE_SELF)[0]+\ # time in user mode resource.getrusage(resource.RUSAGE_SELF)[1] # time in system mode

which allows for timing like so:

def timefunc( func ): start=cpu_time() func() return (cpu_time()-start)

Then I test like:

def f(): for i in range(int(1e6)): pass avg = 0 for k in range(10): avg += timefunc( f ) / 10.0 print avg => 0.002199700000000071

C++

Using the ctime lib:

#include <ctime> #include <iostream> int main() { double avg = 0.0; int N = (int) 1e6; for (int k=0; k<10; k++) { clock_t start; start = clock(); for (int i=0; i<N; i++) continue; avg += (double)(clock()-start) / 10.0 / CLOCKS_PER_SEC; } std::cout << avg << '\n'; return 0; }

which yields 0.002.

Concerns:

I've read that C++ clock() measures CPU time which is what I'm after, but I can't seem to find if it includes both user and system times. Results from C++ are much less precise. Why is that? Overall fairness of comparison as mentioned.

Update

updated the c++ code as per David's suggestion in the comments:

#include <sys/resource.h> #include <iostream> int main() { double avg = 0.0; int N = (int) 1e6; int tally = 0; struct rusage usage; struct timeval ustart, ustop, sstart, sstop; getrusage(RUSAGE_SELF, &usage); ustart = usage.ru_utime; sstart = usage.ru_stime; for (int k=0; k<10; k++) { ustart = usage.ru_utime; sstart = usage.ru_stime; for (int i=0; i<N; i++) continue; getrusage(RUSAGE_SELF, &usage); ustop = usage.ru_utime; sstop = usage.ru_stime; avg += ( (ustop.tv_sec+ustop.tv_usec/1e6+ sstop.tv_sec+sstop.tv_usec/1e6) - (ustart.tv_sec+ustart.tv_usec/1e6+ sstart.tv_sec+sstart.tv_usec/1e6) ) / 10.0; } std::cout << avg << '\n'; return 0; }

Running:

g++ -O0 cpptimes.cpp ; ./a.out => 0.0020996 g++ -O1 cpptimes.cpp ; ./a.out => 0

So I suppose getrusage gets me a little bit better resolution, but I'm not sure how much I should read into it. Setting the optimization flag certainly makes a big difference.

最满意答案

文件说：

“ 返回自与程序执行相关的实现定义时代开始以来进程使用的近似处理器时间。要将结果值转换为秒，请将其除以CLOCKS_PER_SEC。 ”

这很模糊。 CLOCK_PER_SEC设置为10^6 ，近似表示分辨率较差，而不是当前时钟快速超过1000并且结果是四舍五入的。这可能不是一个非常技术性的术语，但它是合适的。我测试的每个地方的实际分辨率大约是100Hz = 0,01s。多年来一直如此。请注意日期http://www.guyrutenberg.com/2007/09/10/resolution-problems-in-clock/ 。

然后文档如下：“ 在兼容POSIX的系统上，时钟ID为CLOCK_PROCESS_CPUTIME_ID的clock_gettime提供更好的分辨率。 ”

所以：

这只是CPU时间。但是2个线程= 2 * CPU时间。请参阅cppreference上的示例。

如上所述，它根本不适用于细粒度测量。你处于准确的边缘。

IMO测量挂钟是唯一合理的事情，但它是一个相当个人的意见。特别是对于多线程应用程序和一般的多处理。否则system + user结果应该是相似的。

编辑：在3.这当然适用于计算任务。如果您的进程使用sleep或放弃执行回系统，则测量CPU时间可能更为可行。还有关于clock分辨率是错误的评论......糟糕。它是，但公平地说，你可以说你不应该测量这么短的计算。 IMO太糟糕了，但如果你测量几秒钟的时间，我猜它很好。我会亲自使用其他可用的工具。

The documentation says:

"Returns the approximate processor time used by the process since the beginning of an implementation-defined era related to the program's execution. To convert result value to seconds divide it by CLOCKS_PER_SEC."

That's pretty vague. CLOCK_PER_SEC is set to 10^6 and the approximate stands for poor resolution, not that the current clocks tick over 1000 faster and the results are rounded. That might be not a very technical term, but it is appropriate. The actual resolution everywhere I tested was about 100Hz = 0,01s. It's been like that for years. Note date here http://www.guyrutenberg.com/2007/09/10/resolution-problems-in-clock/.

Then the doc follows with: "On POSIX-compatible systems, clock_gettime with clock id CLOCK_PROCESS_CPUTIME_ID offers better resolution."

So:

It's CPU time only. But 2 threads = 2*CPU time. See the example on cppreference.

It is not suited for fine grain measurements at all, as explained above. You were on the verge of its accuracy.

IMO measuring wall-clock is the only sensible thing, but its a rather personal opinion. Especially with multithreaded applications and multiprocessing in general. Otherwise results of system+user should be similar anyways.

EDIT: At 3. This of course holds for computational tasks. If your process uses sleep or give up execution back to system, it might be more feasible measuring CPU time. Also regarding the comment that clock resolution is erm... bad. It is, but to be fair one could argue you should not measure such short computations. IMO its too bad, but if you measure times over few seconds I guess its fine. I would personally use others available tools.

更多推荐