官方的OpenCL 2.2标准是否支持WaveFront?

编程入门 行业动态 更新时间:2024-10-20 05:38:41
本文介绍了官方的OpenCL 2.2标准是否支持WaveFront?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

众所周知,AMD-OpenCL支持WaveFront(2015年8月): amd-dev.wpenginedna-cdn/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf

As known, AMD-OpenCL supports WaveFront (August 2015): amd-dev.wpenginedna-cdn/wordpress/media/2013/12/AMD_OpenCL_Programming_Optimization_Guide2.pdf

例如,AMD Radeon HD 7770 GPU支持超过25,000个 机上工作项,并可以切换到新的波前(包含 最多64个工作项).

The AMD Radeon HD 7770 GPU, for example, supports more than 25,000 in-flight work-items and can switch to a new wavefront (containing up to 64 work-items) in a single cycle.

但是为什么在OpenCL标准1.0/2.0/2.2中没有提及WaveFront?

But why in the OpenCL standards 1.0/2.0/2.2 there is no mention about the WaveFront?

PDF都没有一个单词 WaveFront : https ://www.khronos/registry/OpenCL/specs/

None of the PDF has not a word WaveFront: www.khronos/registry/OpenCL/specs/

我也发现:

  • 2013: community.amd/thread/160658

OpenCL是一个开放标准.它仍然不支持这种混乱 概念.它甚至还不支持波前/翘曲.

OpenCL is a open standard. It still does not support this swizzling concept. It does not even support wavefront/warp yet.

  • 2013年: stackoverflow/a/19874984/1558037
    • 2013: stackoverflow/a/19874984/1558037
    • 这就是为什么该概念不在OpenCL规范本身中的原因.

      That's why the concept is not on the OpenCL specification itself.

      • 2011: forums.khronos/showthread.php/7211-How-can-i-split-my-work-load-in-a-GPU-with-OpenCL
        • 2011: forums.khronos/showthread.php/7211-How-can-i-split-my-work-load-in-a-GPU-with-OpenCL
        • 标准OpenCL没有波前"的概念

          Standard OpenCL doesn't have the notion of a "wavefront"

          • 2011: www.cvg.ethz .ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf
            • 2011: www.cvg.ethz.ch/teaching/2011spring/gpgpu/GPU-Optimization.pdf
            • 确实是官方的OpenCL 2.2标准仍不支持WaveFront吗?

              Indeed the official OpenCL 2.2 standard still does not support the WaveFront?

              结论:

              OpenCL标准中没有WaveFront,但是在OpenCL-2.0中,有类似于WaveFronts的SIMD执行模型的子组.

              There is no WaveFront in OpenCL standard, but in OpenCL-2.0 there is Sub-groups with SIMD execution model akin to WaveFronts.

              • 第100页: amd-dev.wpenginedna-cdn/wordpress/media/2013/12/AMD_OpenCL_Programming_User_Guide2.pdf

              6.4.2工作组/子组级别的功能

              6.4.2 Workgroup/subgroup-level functions

              OpenCL 2.0引入了Khronos 子组扩展.子组是 类似于硬件SIMD执行模型的逻辑抽象 波前,扭曲或矢量,并允许更靠近 与供应商无关的硬件.此扩展包括一组 跨子组内置函数的集合 上面指定的跨工作组内置函数.

              OpenCL 2.0 introduces a Khronos sub-group extension. Sub-groups are a logical abstraction of the hardware SIMD execution model akin to wavefronts, warps, or vectors and permit programming closer to the hardware in a vendor-independent manner. This extension includes a set of cross-sub-group built-in functions that match the set of the cross-work-group built-in functions specified above.

              推荐答案

              他们必须采用一种称为sub-group的更具动态性的方法: www.khronos/registry/OpenCL/specs/opencl-2.2.pdf

              They must have gone to a more dynamical approach called sub-group: www.khronos/registry/OpenCL/specs/opencl-2.2.pdf

              Sub-group: Sub-groups are an implementation-dependent grouping of work-items within a work-group. The size and number of sub-groups is implementation-defined.

              Work-groups are further divided into sub-groups, which provide an additional level of control over execution.

              The mapping of work-items to sub-groups is implementation-defined and may be queried at runtime.

              所以即使它不被称为wavefront,它现在也可以在运行时和

              so even if its not called wavefront, its now queryable in run-time and

              在没有同步功能(例如障碍)的情况下, 子组中的工作项可以序列化.在......的存在下 子组功能,子组内的工作项可以序列化 在任何给定的子组功能之前,在动态遇到 成对的子组功能以及工作组功能之间 内核的末尾.

              In the absence of synchronization functions (e.g. a barrier), work-items within a sub-group may be serialized. In the presence of sub -group functions, work-items within a sub -group may be serialized before any given sub -group function, between dynamically encountered pairs of sub - group functions and between a work-group function and the end of the kernel.

              甚至锁步方式有时也会丢失.

              even lockstep manner may be lost at times.

              最重要的是

              sub_group_all() and sub_group_broadcast() and are described in OpenCL C++ kernel language and IL specifications. The use of these sub-group functions implies sequenced-before relationships between statements within the execution of a single work-item in order to satisfy data dependencies.

              说存在某种类型的子组内部通信.因为现在opencl具有子内核定义:

              saying that some kind of intra-sub-group communication exists. Because now opencl has child-kernel definition:

              Device-side enqueue: A mechanism whereby a kernel-instance is enqueued by a kernel-instance running on a device without direct involvement by the host program. This produces nested parallelism; i.e. additional levels of concurrency are nested inside a running kernel-instance. The kernel-instance executing on a device (the parent kernel) enqueues a kernel-instance (the child kernel) to a device-side command queue. Child and parent kernels execute asynchronously though a parent kernel does not complete until all of its child-kernels have completed.

              最终,类似

              Ultimately, with something like

              kernel void launcher() { ndrange_t ndrange = ndrange_1D(1); enqueue_kernel(get_default_queue(), CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange, ^{ size_t id = get_global_id(0); } ); }

              您应该能够生成所需大小的自己的(已升级?)波前,并且它们与父内核同时工作(并且可以与子组内的线程进行通信),但是由于它们没有经过硬编码,因此它们不被称为波前.通过硬件恕我直言.

              you should be able to spawn your own (upgraded?)wavefronts with any size you need and they work concurrently with parent kernel(and can communicate intra-sub-group threads) but they are not called wavefronts because they are not hardcoded by hardware imho.

              2.0 api规范说:

              2.0 api specs saying:

              Extreme care should be exercised when writing code that uses subgroups if the goal is to write portable OpenCL applications.

              让我们想起amd的16宽simds和nvidia的32宽simds与一些虚构的fpga的95宽计算核心.也许是伪波前?

              which reminds amd's 16-wide simds and nvidia's 32-wide simds versus some imaginary fpga's 95-wide compute cores. Pseudo-wavefront maybe?

更多推荐

官方的OpenCL 2.2标准是否支持WaveFront?

本文发布于:2023-10-09 06:04:23,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1474905.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:标准   官方   OpenCL   WaveFront

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!