函数 rand_r(seedp)是严重瓶颈我的程序。具体来说,当我运行serialy时,它减慢了3倍,在16个内核上运行时减慢了4.4倍。 rand()不是一个选项,因为它更糟。有什么我可以做的简化这个吗?如果它会有所变化,我认为我可以在统计随机性方面承受损失。将预生成(执行前)随机数列表,然后加载到线程堆栈是一个选项?
The function rand_r(seedp) is seriously bottle-necking my program. Specifically, its slowing me by 3X when run serialy, and 4.4X when run on 16 cores. rand() is not an option because its even worse. Is there anything I can do to streamline this? If it will make a difference, I think I can sustain a loss in terms of statistical randomness. Would pre-generating (before execution) a list of random numbers and then loading to the thread stacks be an option?
推荐答案问题是 seedp 变量在几个线程之间共享。处理器核心必须在每次访问这种不断变化的值时同步其高速缓存,这会妨碍性能。解决方案是所有线程都使用自己的 seedp ,因此避免缓存同步。
Problem is that seedp variable (and its memory location) is shared among several threads. Processor cores must synchronize their caches each time they access this ever changing value, which hampers performance. The solution is that all threads work with their own seedp, and so avoid cache synchronization.
更多推荐
C ++超快速线程安全rand函数
发布评论