使用gettimeofday缓慢的初始计时结果

编程入门 行业动态 更新时间:2024-10-26 06:25:43
使用gettimeofday缓慢的初始计时结果 - 在RHEL6 Server下更糟糕(Slow initial timing results using gettimeofday - worse under RHEL6 Server)

我使用gettimeofday()来计算一个简单的矩阵乘法示例,但我得到的结果最初接近两倍。 在RHEL6服务器计算机上,我得到的“坏”计时结果长达近1秒(在此示例中约为65个单独计时)。 我们所有的其他机器都是RHEL5工作站盒子,这个代码在它们上运行更好; 我最初只得到几个“坏”的结果(第一个~20毫秒)。

从本网站的帖子来看,我认为这可能与操作系统进程调度程序有关。 如果我取消注释下面的第一个“for”语句(从而通过重复初始化矩阵a,b和c来插入初始忙循环),我在RHEL5 Workstation和RHEL6 Server下都得到零“坏”结果。 或者,如果我取消注释睡眠语句,我会得到RHEL5和RHEL6的所有“坏”时序结果。

出于某种原因,我的进程最初启动时只有大约一半的CPU访问权限,只要进程保持繁忙,它就会“完全”访问CPU。 如果它“休眠”然后恢复计时,它再次暂时只获得大约一半的CPU访问权限。

机器上没有其他任何事情发生(X未运行)。 我试过“chrt”来控制进程的优先级,但这没有改变。 我已经在GCC 4.4.6和ICC 12.1.0中验证了这一点。 我也试过“很好”。

代码如下:

#include <stdio.h> #include <unistd.h> #include <sys/time.h> #define N 225 #define DELAY_LOOPS 8000 main() { struct timeval _t0, _t1, _t2; double a[N][N], b[N][N], c[N][N]; double millisec, cum_ms; int i, j, k, l, m=0; gettimeofday( &_t0, NULL ); // for( l=0; l<DELAY_LOOPS; l++ ) for( i=0; i<N; i++ ) for( j=0; j<N; j++ ) { a[i][j]=0; b[i][j]=i; c[i][j]=j; } for( l=0; l<75; l++ ) { gettimeofday( &_t1, NULL ); for( i=0; i<N; i++ ) for( j=0; j<N; j++ ) for( k=0; k<N; k++ ) a[i][j]+=b[i][k]*c[k][j]; gettimeofday( &_t2, NULL ); millisec = 1000*(_t2.tv_sec-_t1.tv_sec); millisec += 1e-3*(_t2.tv_usec-_t1.tv_usec); cum_ms = 1000*(_t2.tv_sec-_t0.tv_sec); cum_ms += 1e-3*(_t2.tv_usec-_t0.tv_usec); printf( "%d: duration %fms, cumulative %fms\n", m++, millisec, cum_ms ); // sleep( 2 ); } printf( "a[%d][%d]=%f\n", N/2, N/2, a[N/2][N/2] ); }

以下是结果:

% icc -O2 -o test main.c; ./test 0: duration 13.049000ms, cumulative 13.677000ms 1: duration 13.026000ms, cumulative 26.753000ms 2: duration 12.911000ms, cumulative 39.668000ms 3: duration 12.913000ms, cumulative 52.584000ms 4: duration 12.914000ms, cumulative 65.501000ms 5: duration 12.911000ms, cumulative 78.415000ms 6: duration 12.912000ms, cumulative 91.331000ms /* snip */ 64: duration 12.912000ms, cumulative 840.633000ms 65: duration 10.455000ms, cumulative 851.092000ms 66: duration 5.910000ms, cumulative 857.004000ms 67: duration 5.908000ms, cumulative 862.914000ms 68: duration 5.907000ms, cumulative 868.823000ms 69: duration 5.908000ms, cumulative 874.732000ms 70: duration 5.912000ms, cumulative 880.646000ms 71: duration 5.907000ms, cumulative 886.554000ms 72: duration 5.907000ms, cumulative 892.462000ms 73: duration 5.908000ms, cumulative 898.372000ms 74: duration 5.908000ms, cumulative 904.281000ms a[112][112]=211680000.000000

无论优化级别如何(-O0,-O1,-O2等),我都会遇到问题。

有没有人知道如何在RHEL6服务器下完成调度? 它与RHEL5工作站有很大不同吗? 我认为我看到的差异更多的是这样一个事实,即一个盒子是RHEL的服务器版本,另一个是工作站版本(而不是版本5与6之间的差异)。 是否有一些简单的方法可以在RHEL6 Server下减少这种影响并使其更像RHEL5 Workstation框?

有任何想法吗? 谢谢。

I am using gettimeofday() to time a simple matrix multiply example, but I'm getting results that are close to twice too long initially. On a RHEL6 Server machine, I'm getting "bad" timing results for up to nearly 1 second (~65 individual timings in this example). All our other machines are RHEL5 Workstation boxes and this code works much better on them; I only get a couple of "bad" results initially (for first ~20 milliseconds).

From posts on this site, I think this probably has something to do with the OS process scheduler. If I uncomment the first "for" statement below (thereby inserting an initial busy loop by repeatedly initializing the matrices a, b and c), I get zero "bad" results under both RHEL5 Workstation and RHEL6 Server. Alternatively, if I uncomment the sleep statement, I get ALL "bad" timing results for both RHEL5 & RHEL6.

For some reason, my process is starting up with only about half the access to the CPU initially, then it gets "full" access to the CPU as long as the process stays busy. If it "sleeps" and then resumes timing, it again is temporarily only getting about half the full access to the CPU.

Nothing else is happening on the machine (X is not running). I have tried "chrt" to control the priority of the process, but that changed nothing. I've verified this occurs with both GCC 4.4.6 and ICC 12.1.0. I've tried "nice" as well.

Here's the code:

#include <stdio.h> #include <unistd.h> #include <sys/time.h> #define N 225 #define DELAY_LOOPS 8000 main() { struct timeval _t0, _t1, _t2; double a[N][N], b[N][N], c[N][N]; double millisec, cum_ms; int i, j, k, l, m=0; gettimeofday( &_t0, NULL ); // for( l=0; l<DELAY_LOOPS; l++ ) for( i=0; i<N; i++ ) for( j=0; j<N; j++ ) { a[i][j]=0; b[i][j]=i; c[i][j]=j; } for( l=0; l<75; l++ ) { gettimeofday( &_t1, NULL ); for( i=0; i<N; i++ ) for( j=0; j<N; j++ ) for( k=0; k<N; k++ ) a[i][j]+=b[i][k]*c[k][j]; gettimeofday( &_t2, NULL ); millisec = 1000*(_t2.tv_sec-_t1.tv_sec); millisec += 1e-3*(_t2.tv_usec-_t1.tv_usec); cum_ms = 1000*(_t2.tv_sec-_t0.tv_sec); cum_ms += 1e-3*(_t2.tv_usec-_t0.tv_usec); printf( "%d: duration %fms, cumulative %fms\n", m++, millisec, cum_ms ); // sleep( 2 ); } printf( "a[%d][%d]=%f\n", N/2, N/2, a[N/2][N/2] ); }

and here are the results:

% icc -O2 -o test main.c; ./test 0: duration 13.049000ms, cumulative 13.677000ms 1: duration 13.026000ms, cumulative 26.753000ms 2: duration 12.911000ms, cumulative 39.668000ms 3: duration 12.913000ms, cumulative 52.584000ms 4: duration 12.914000ms, cumulative 65.501000ms 5: duration 12.911000ms, cumulative 78.415000ms 6: duration 12.912000ms, cumulative 91.331000ms /* snip */ 64: duration 12.912000ms, cumulative 840.633000ms 65: duration 10.455000ms, cumulative 851.092000ms 66: duration 5.910000ms, cumulative 857.004000ms 67: duration 5.908000ms, cumulative 862.914000ms 68: duration 5.907000ms, cumulative 868.823000ms 69: duration 5.908000ms, cumulative 874.732000ms 70: duration 5.912000ms, cumulative 880.646000ms 71: duration 5.907000ms, cumulative 886.554000ms 72: duration 5.907000ms, cumulative 892.462000ms 73: duration 5.908000ms, cumulative 898.372000ms 74: duration 5.908000ms, cumulative 904.281000ms a[112][112]=211680000.000000

I experience the problem regardless of optimization level (-O0, -O1, -O2, etc.).

Does anyone know anything about how scheduling is done under RHEL6 Server? Is it so very different from RHEL5 Workstation? I presume the difference I'm seeing is more a function of the fact that one box is a server edition of RHEL and the other is a workstation edition (rather than the difference between release 5 vs. 6). Is there some simple way to reduce this effect under RHEL6 Server and make it act more like the RHEL5 Workstation boxes?

Any ideas? Thanks.

最满意答案

处理器能否进入低功耗状态? 像powertop这样的东西可以告诉你。 (在频率统计选项卡中)

Could the processor be entering a low-power state? Something like powertop can tell you that. (in the Frequency stats tab)

更多推荐

本文发布于:2023-07-22 20:48:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1223305.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:缓慢   gettimeofday

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!