使用numpy结构化数组读取二进制文件(Reading a binary file with numpy structured array)

编程入门行业动态更新时间:2024-10-27 22:19:54

我正在使用以下方法读取二进制文件

numpy.fromfile(file, dtype=)

二进制文件有多种类型，我知道组织。因此我定义了一个dtype数组如下：

dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1), ('d','i4',1),('e','S1',8)]

这个dtype数组说二进制文件的第一个值是一个整数后跟8个字符等...

我遇到的问题是二进制文件不是dtypearr的大小。二进制文件具有在dtypearr中重复n次定义的结构。

到目前为止，我所做的是用新字段名重复dtypearr，直到它与二进制文件大小相同。

但是，我希望不知何故，我可以在不重复dtypearr的情况下实现这一目标。相反，我希望在每个字段中存储一个数组。例如，我想结构化阵列['a']或structuredarray ['b']给我一个数组而不是单个值。

编辑

注意：

numpy.fromfile(file, dtype=dtypearr)

当模式完全相同时，实现我想要的。以下解决方案也有效。

但是，我提到的二进制文件中的模式并不完全重复。例如，有一个标题部分和多个子部分。每个小节都有自己的重复模式。 f.seek（）将适用于最后一小节，但不适用于之前的小节。

I am reading a binary file using the following method

numpy.fromfile(file, dtype=)

The binary file has multiple types present and i know the organization. Therefore I have defined a dtype array as follows:

dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1), ('d','i4',1),('e','S1',8)]

This dtype array is saying that the first value of the binary file is one integer followed by 8 characters etc...

The problem i am having is that the binary file is not the size of dtypearr. The binary file has the structure defined in dtypearr repeating n times.

So far, what i have done is repeat the dtypearr with new field names until it is the same size as the binary file.

However, i was hoping that somehow, I could achieve this goal without repeating dtypearr. Instead I want an array to be stored in each field. For example, i want structuredarray['a'] or structuredarray['b'] to give me an array instead of a single value.

Edit

Note that:

numpy.fromfile(file, dtype=dtypearr)

Achieves what i want when the pattern is exactly the same. The solution below also works.

However, the pattern in the binary file i mentioned isn't exactly repeating. For example, there is a header portion and multiple subsections. And each subsection has its own repeating pattern. f.seek() will work for the last subsection, but not the subsections before.

最满意答案

尝试：

import numpy as np import string # Create some fake data N = 10 dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')]) a = np.zeros(N, dtype) a['a'] = np.random.random_integers(0,3, N) a['b'] = np.array([x for x in string.ascii_lowercase[:N]]) a['c'] = np.random.normal(size=(N,)) # Write to a binary file a.tofile('test.dat') # Read data into new array b = np.fromfile('test.dat', dtype=dtype)

数组a和b是相同的（即np.all(a['a'] == b['a']) is True ）：

for col in a.dtype.names: print col, np.all(a[col] == b[col]) # Prints: # a True # b True # c True

更新：

如果您有标题信息，可以先打开文件，查找数据的起点然后再读取。例如：

f = open("test.dat", "rb") f.seek(header_size) b = np.fromfile(f, dtype=dtype) f.close()

你必须知道大小（ header_size ），但你应该是好的。如果有子部分，您可以提供要抓取的项目数。我没有测试计数是否有效。如果你不受这种二进制格式的约束，我建议使用像hdf5这样的东西在一个文件中存储多个数组。

Try:

The arrays a and b are identical (i.e np.all(a['a'] == b['a']) is True):

for col in a.dtype.names: print col, np.all(a[col] == b[col]) # Prints: # a True # b True # c True

Update:

If you have header information, you can first open the file, seek to the starting point of the data and then read. For example:

f = open("test.dat", "rb") f.seek(header_size) b = np.fromfile(f, dtype=dtype) f.close()

You have to know the size (header_size), but then you should be good. If there are subsections, you can supply a count of the number of items to grab. I haven't tested if the counts works. If you are not bound to this binary format, I would recommend using something like hdf5 to store multiple arrays in a single file.

更多推荐

本文发布于:2023-08-05 12:27:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1432705.html