传递一个fortran派生类型,它包含不同编译器之间的可分配数组(PGI和Intel)(Pass a fortran derived type which contains allocatable a

编程入门 行业动态 更新时间:2024-10-24 22:30:46
传递一个fortran派生类型,它包含不同编译器之间的可分配数组(PGI和Intel)(Pass a fortran derived type which contains allocatable array between different compilers(PGI and Intel))

我们有一个项目可以发展Nvidia GPU和Intel Xeon Phi。 主机代码和GPU代码用Fortran编写,由pgfortran编译。 为了将我们的一些工作卸载到Phi,我们必须创建一个由ifort编译的共享库(静态链接不能工作),并从代码的pgfortran部分调用共享子例程。 通过这样做,我们可以将代码的pgfortran部分中的数组卸载到可以与Xeon Phi通信的intel fortran共享库中。

现在我正在尝试将包含可分配数组的派生类型从代码的pgfortran部分传递给ifort共享库。 看起来有一些问题。

这是一个简单的例子(这里没有Xeon Phi卸载指令):

caller.f90:

program caller type cell integer :: id real, allocatable :: a(:) real, allocatable :: b(:) real, allocatable :: c(:) end type cell integer :: n,i,j type(cell) :: cl(2) n=10 do i=1,2 allocate(cl(i)%a(n)) allocate(cl(i)%b(n)) allocate(cl(i)%c(n)) end do do j=1, 2 do i=1, n cl(j)%a(i)=10*j+i cl(j)%b(i)=10*i+j end do end do call offload(cl(1)) print *, cl(1)%c end program caller

called.f90:

subroutine offload(cl) type cell integer :: id real, allocatable :: a(:) real, allocatable :: b(:) real, allocatable :: c(:) end type cell type(cell) :: cl integer :: n print *, cl%a(1:10) print *, cl%b(1:10) end subroutine offload

Makefile文件:

run: caller.o libcalled.so pgfortran -L. caller.o -lcalled -o $@ caller.o: caller.f90 pgfortran -c caller.f90 libcalled.so: called.f90 ifort -shared -fPIC $^ -o $@

注意这里的“ cl%a(1:10) ”,除了“ (1:10) ”之外没有任何印刷品。

此代码最终打印出cl(1)%a的元素,然后在下一行中遇到分段错误,我试图打印出数组cl(1)%b 。

如果我将“ cl%a(1:10) ”更改为“cl%a(1:100)”,并删除“ print *, cl%b(1:10) ”。 它会得到以下结果:

我们可以发现b数组中的元素存在,但我无法通过“ cl%b(1:10) ”获取它们。

我知道这可能是由不同编译器的不同派生类型结构引起的。 但我真的想要一种可以在编译器之间传递这种派生类型的方法。 有解决方案?

谢谢!

We have a project which evolves Nvidia GPU and Intel Xeon Phi. The host code and the GPU code is written in Fortran and compiled by pgfortran. To offload some of our job to the Phi, we have to make a shared library compiled by the ifort( static link cannot work) and call the shared subroutine from the pgfortran part of the code. By doing so, we can offload arrays from the pgfortran part of code to the intel fortran shared library which can communicate with the Xeon Phi.

Now I'm trying to pass a derived type which contains allocatable arrays from the pgfortran part of code to the ifort shared library. Looks like there are some problems.

Here is a simple example( no Xeon Phi offload directive here):

caller.f90:

program caller type cell integer :: id real, allocatable :: a(:) real, allocatable :: b(:) real, allocatable :: c(:) end type cell integer :: n,i,j type(cell) :: cl(2) n=10 do i=1,2 allocate(cl(i)%a(n)) allocate(cl(i)%b(n)) allocate(cl(i)%c(n)) end do do j=1, 2 do i=1, n cl(j)%a(i)=10*j+i cl(j)%b(i)=10*i+j end do end do call offload(cl(1)) print *, cl(1)%c end program caller

called.f90:

subroutine offload(cl) type cell integer :: id real, allocatable :: a(:) real, allocatable :: b(:) real, allocatable :: c(:) end type cell type(cell) :: cl integer :: n print *, cl%a(1:10) print *, cl%b(1:10) end subroutine offload

Makefile:

run: caller.o libcalled.so pgfortran -L. caller.o -lcalled -o $@ caller.o: caller.f90 pgfortran -c caller.f90 libcalled.so: called.f90 ifort -shared -fPIC $^ -o $@

Notice the "cl%a(1:10)" here, witout the "(1:10)" there would be nothing printed.

This code finally printed out the elements in the cl(1)%a and then hit a segmentation fault in the next line where I tried to print out the array cl(1)%b.

If I change the "cl%a(1:10)" to "cl%a(1:100)", and delete the "print *, cl%b(1:10)". It would give a result of:

We can find that the elements in the b array are there but I just can not fetch them by the "cl%b(1:10)".

I know that this may be caused by the different derived type structure of different compilers. But I really want a way by which we can pass this kind of derived type between compilers. Any solutions?

Thank you!

最满意答案

编译器的ABI可以不同。 您不应该直接传递结构,而是在子例程中构建它们并使用指针,您应该将其作为type(c_ptr)或假定的大小数组传递(但是可以发生复制!)。

与Fortran 2003中的C的互操作性并不仅仅意味着与C交互,而是与C可互操作的任何其他编译器。它可以是不同的Fortran编译器。

请注意,除非类型为sequence或bind(C) ,否则在更多位置声明相同类型并将其用作相同类型是违反Fortran规则的。 这是您的程序不符合标准的另一个原因。

called.f90:

subroutine offload(cl_c) use iso_c_binding type, bind(C) :: cell_C integer :: id integer :: na, nb, nc type(c_ptr) :: a,b,c end type cell_C type cell integer :: id real, pointer :: a(:) real, pointer :: b(:) real, pointer :: c(:) end type cell type(cell) :: cl type(cell_C) :: cl_C integer :: n cl%id = cl_C%id call c_f_pointer(cl_C%a, cl%a, [cl_c%na]) call c_f_pointer(cl_C%b, cl%b, [cl_c%nb]) call c_f_pointer(cl_C%c, cl%c, [cl_c%nc]) print *, cl%a(1:10) print *, cl%b(1:10) end subroutine offload

caller.f90:

program caller use iso_c_binding type, bind(C) :: cell_C integer :: id integer :: na, nb, nc type(c_ptr) :: a,b,c end type cell_C type cell integer :: id real, allocatable :: a(:) real, allocatable :: b(:) real, allocatable :: c(:) end type cell integer :: n,i,j type(cell),target :: cl(2) type(cell_c) :: cl_c n=10 do i=1,2 allocate(cl(i)%a(n)) allocate(cl(i)%b(n)) allocate(cl(i)%c(n)) end do do j=1, 2 do i=1, n cl(j)%a(i)=10*j+i cl(j)%b(i)=10*i+j end do end do cl_c%a = c_loc(cl(1)%a) cl_c%b = c_loc(cl(1)%b) cl_c%c = c_loc(cl(1)%c) cl_c%na = size(cl(1)%a) cl_c%nb = size(cl(1)%b) cl_c%nc = size(cl(1)%c) cl_c%id = cl(1)%id call offload(cl_c) print *, cl(1)%c end program caller

与gfortran和ifort:

>gfortran called.f90 -c -o called.o >ifort caller.f90 -c -o caller.o >ifort -o a.out called.o caller.o -lgfortran >./a.out 11.0000000 12.0000000 13.0000000 14.0000000 15.0000000 16.0000000 17.0000000 18.0000000 19.0000000 20.0000000 11.0000000 21.0000000 31.0000000 41.0000000 51.0000000 61.0000000 71.0000000 81.0000000 91.0000000 101.000000 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00

这里不需要动态库。

对于100%的理论可移植性,可以使用c_int , c_float ,...格式化可能更好等等,但是你明白了。

您还可以重载cell和cell_C之间的分配以简化转换。

The ABI of the compilers can differ. You should not pass the structures directly, but build them inside the subroutines and use pointers, which you should pass as type(c_ptr) or as assumed size arrays (but a copy can happen then!).

The interoperability with C from Fortran 2003 is not meant only to interact with C but any other compiler interoperable with C. It can be a diferent Fortran compiler.

Be aware it is against the rules of Fortran to declare the same type in more places and use it as the same type, unless the type is sequence or bind(C). This is another reason why your program is not standard conforming.

called.f90:

subroutine offload(cl_c) use iso_c_binding type, bind(C) :: cell_C integer :: id integer :: na, nb, nc type(c_ptr) :: a,b,c end type cell_C type cell integer :: id real, pointer :: a(:) real, pointer :: b(:) real, pointer :: c(:) end type cell type(cell) :: cl type(cell_C) :: cl_C integer :: n cl%id = cl_C%id call c_f_pointer(cl_C%a, cl%a, [cl_c%na]) call c_f_pointer(cl_C%b, cl%b, [cl_c%nb]) call c_f_pointer(cl_C%c, cl%c, [cl_c%nc]) print *, cl%a(1:10) print *, cl%b(1:10) end subroutine offload

caller.f90:

program caller use iso_c_binding type, bind(C) :: cell_C integer :: id integer :: na, nb, nc type(c_ptr) :: a,b,c end type cell_C type cell integer :: id real, allocatable :: a(:) real, allocatable :: b(:) real, allocatable :: c(:) end type cell integer :: n,i,j type(cell),target :: cl(2) type(cell_c) :: cl_c n=10 do i=1,2 allocate(cl(i)%a(n)) allocate(cl(i)%b(n)) allocate(cl(i)%c(n)) end do do j=1, 2 do i=1, n cl(j)%a(i)=10*j+i cl(j)%b(i)=10*i+j end do end do cl_c%a = c_loc(cl(1)%a) cl_c%b = c_loc(cl(1)%b) cl_c%c = c_loc(cl(1)%c) cl_c%na = size(cl(1)%a) cl_c%nb = size(cl(1)%b) cl_c%nc = size(cl(1)%c) cl_c%id = cl(1)%id call offload(cl_c) print *, cl(1)%c end program caller

with gfortran and ifort:

>gfortran called.f90 -c -o called.o >ifort caller.f90 -c -o caller.o >ifort -o a.out called.o caller.o -lgfortran >./a.out 11.0000000 12.0000000 13.0000000 14.0000000 15.0000000 16.0000000 17.0000000 18.0000000 19.0000000 20.0000000 11.0000000 21.0000000 31.0000000 41.0000000 51.0000000 61.0000000 71.0000000 81.0000000 91.0000000 101.000000 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00 0.0000000E+00

No dynamic libraries necessary here.

For 100% theoretical portability one could use c_int, c_float,... the formatting could be better and so on, but you get the point.

You can also overload the assignments between cell and cell_C to ease the conversion.

更多推荐

本文发布于:2023-08-03 09:36:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1385278.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:编译器   数组   分配   类型   PGI

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!