使用np.fromfile或open＆struct读取fortran二进制文件（流访问）(Reading fortran binary (streaming access) with np.fromf

系统教程行业动态更新时间:2024-06-14 17:02:18

使用np.fromfile或open＆struct读取fortran二进制文件（流访问）(Reading fortran binary (streaming access) with np.fromfile or open & struct)

以下Fortran代码：

INTEGER*2 :: i, Array_A(32) Array_A(:) = (/ (i, i=0, 31) /) OPEN (unit=11, file = 'binary2.dat', form='unformatted', access='stream') Do i=1,32 WRITE(11) Array_A(i) End Do CLOSE (11)

生成流式二进制输出，数字从0到31，整数为16位。每条记录占用2个字节，因此它们写在第1,3,5,7字节，依此类推。 access ='stream'为每条记录抑制了Fortran的标准头（我需要这样做以保持文件尽可能小）。

用十六进制编辑器查看它，我得到：

00 00 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00 10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00 18 00 19 00 1A 00 1B 00 1C 00 1D 00 1E 00 1F 00

这是完全正常的（尽管事实上从未使用过第二个字节，因为在我的例子中小数太低）。

现在我需要将这些二进制文件导入Python 2.7，但我不能。我尝试了很多不同的例程，但我总是这样做。

1.尝试： “np.fromfile”

with open("binary2.dat", 'r') as f: content = np.fromfile(f, dtype=np.int16)

回报

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 26104 1242 0 0]

2.尝试： “struct”

import struct with open("binary2.dat", 'r') as f: content = f.readlines() struct.unpack('h' * 32, content)

提供

struct.error: unpack requires a string argument of length 64

因为

print content ['\x00\x00\x01\x00\x02\x00\x03\x00\x04\x00\x05\x00\x06\x00\x07\x00\x08\x00\t\x00\n', '\x00\x0b\x00\x0c\x00\r\x00\x0e\x00\x0f\x00\x10\x00\x11\x00\x12\x00\x13\x00\x14\x00\x15\x00\x16\x00\x17\x00\x18\x00\x19\x00']

（注意根据Fortran的“流媒体”访问权限，不应该有分隔符，t和n）

3.尝试： “FortranFile”

f = FortranFile("D:/Fortran/Sandbox/binary2.dat", 'r') print(f.read_ints(dtype=np.int16))

有错误：

TypeError: only length-1 arrays can be converted to Python scalars

（记住它是如何在文件中间检测到分隔符的，但是对于没有换行符的较短文件也会崩溃（例如从0到8的小数））

一些额外的想法：

Python似乎在阅读部分二进制文件时遇到麻烦。对于np.fromfile它读取Hex 19 （dec：25），但崩溃为Hex 1A （dec：26）。它似乎与字母混淆，虽然0A，0B ......工作得很好。

对于尝试2， content结果很奇怪。小数0到8工作正常，但有一个奇怪的\t\x00\n事情。那么hex 09是什么呢？

我花了好几个小时试图找到逻辑，但我被困住了，真的需要一些帮助。有任何想法吗？

The following Fortran code:

INTEGER*2 :: i, Array_A(32) Array_A(:) = (/ (i, i=0, 31) /) OPEN (unit=11, file = 'binary2.dat', form='unformatted', access='stream') Do i=1,32 WRITE(11) Array_A(i) End Do CLOSE (11)

Produces streaming binary output with numbers from 0 to 31 in integer 16bit. Each record is taking up 2 bytes, so they are written at byte 1, 3, 5, 7 and so on. The access='stream' suppresses the standard header of Fortran for each record (I need to do that to keep the files as tiny as possible).

Looking at it with a Hex-Editor, I get:

00 00 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08 00 09 00 0A 00 0B 00 0C 00 0D 00 0E 00 0F 00 10 00 11 00 12 00 13 00 14 00 15 00 16 00 17 00 18 00 19 00 1A 00 1B 00 1C 00 1D 00 1E 00 1F 00

which is completely fine (despite the fact that the second byte is never used, because decimals are too low in my example).

Now I need to import these binary files into Python 2.7, but I can't. I tried many different routines, but I always fail in doing so.

1. attempt: "np.fromfile"

with open("binary2.dat", 'r') as f: content = np.fromfile(f, dtype=np.int16)

returns

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 26104 1242 0 0]

2. attempt: "struct"

import struct with open("binary2.dat", 'r') as f: content = f.readlines() struct.unpack('h' * 32, content)

delivers

struct.error: unpack requires a string argument of length 64

because

(note the delimiter, the t and the n which shouldn't be there according to what Fortran's "streaming" access does)

3. attempt: "FortranFile"

f = FortranFile("D:/Fortran/Sandbox/binary2.dat", 'r') print(f.read_ints(dtype=np.int16))

With the error:

TypeError: only length-1 arrays can be converted to Python scalars

(remember how it detected a delimiter in the middle of the file, but it would also crash for shorter files without line break (e.g. decimals from 0 to 8))

Some additional thoughts:

Python seems to have troubles with reading parts of the binary file. For np.fromfile it reads Hex 19 (dec: 25), but crashes for Hex 1A (dec: 26). It seems to be confused with the letters, although 0A, 0B ... work just fine.

For attempt 2 the content-result is weird. Decimals 0 to 8 work fine, but then there is this strange \t\x00\n thing. What is it with hex 09 then?

I've been spending hours trying to find the logic, but I'm stuck and really need some help. Any ideas?