使用numpy结构化数组读取二进制文件(Reading a binary file with numpy structured array)

编程入门 行业动态 更新时间:2024-10-27 22:19:54
使用numpy结构化数组读取二进制文件(Reading a binary file with numpy structured array)

我正在使用以下方法读取二进制文件

numpy.fromfile(file, dtype=)

二进制文件有多种类型,我知道组织。 因此我定义了一个dtype数组如下:

dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1), ('d','i4',1),('e','S1',8)]

这个dtype数组说二进制文件的第一个值是一个整数后跟8个字符等...

我遇到的问题是二进制文件不是dtypearr的大小。 二进制文件具有在dtypearr中重复n次定义的结构。

到目前为止,我所做的是用新字段名重复dtypearr,直到它与二进制文件大小相同。

但是,我希望不知何故,我可以在不重复dtypearr的情况下实现这一目标。 相反,我希望在每个字段中存储一个数组。 例如,我想结构化阵列['a']或structuredarray ['b']给我一个数组而不是单个值。

编辑

注意:

numpy.fromfile(file, dtype=dtypearr)

当模式完全相同时,实现我想要的。 以下解决方案也有效。

但是,我提到的二进制文件中的模式并不完全重复。 例如,有一个标题部分和多个子部分。 每个小节都有自己的重复模式。 f.seek()将适用于最后一小节,但不适用于之前的小节。

I am reading a binary file using the following method

numpy.fromfile(file, dtype=)

The binary file has multiple types present and i know the organization. Therefore I have defined a dtype array as follows:

dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1), ('d','i4',1),('e','S1',8)]

This dtype array is saying that the first value of the binary file is one integer followed by 8 characters etc...

The problem i am having is that the binary file is not the size of dtypearr. The binary file has the structure defined in dtypearr repeating n times.

So far, what i have done is repeat the dtypearr with new field names until it is the same size as the binary file.

However, i was hoping that somehow, I could achieve this goal without repeating dtypearr. Instead I want an array to be stored in each field. For example, i want structuredarray['a'] or structuredarray['b'] to give me an array instead of a single value.

Edit

Note that:

numpy.fromfile(file, dtype=dtypearr)

Achieves what i want when the pattern is exactly the same. The solution below also works.

However, the pattern in the binary file i mentioned isn't exactly repeating. For example, there is a header portion and multiple subsections. And each subsection has its own repeating pattern. f.seek() will work for the last subsection, but not the subsections before.

最满意答案

尝试:

import numpy as np import string # Create some fake data N = 10 dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')]) a = np.zeros(N, dtype) a['a'] = np.random.random_integers(0,3, N) a['b'] = np.array([x for x in string.ascii_lowercase[:N]]) a['c'] = np.random.normal(size=(N,)) # Write to a binary file a.tofile('test.dat') # Read data into new array b = np.fromfile('test.dat', dtype=dtype)

数组a和b是相同的(即np.all(a['a'] == b['a']) is True ):

for col in a.dtype.names: print col, np.all(a[col] == b[col]) # Prints: # a True # b True # c True

更新:

如果您有标题信息,可以先打开文件,查找数据的起点然后再读取。 例如:

f = open("test.dat", "rb") f.seek(header_size) b = np.fromfile(f, dtype=dtype) f.close()

你必须知道大小( header_size ),但你应该是好的。 如果有子部分,您可以提供要抓取的项目数。 我没有测试计数是否有效。 如果你不受这种二进制格式的约束,我建议使用像hdf5这样的东西在一个文件中存储多个数组。

Try:

import numpy as np import string # Create some fake data N = 10 dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')]) a = np.zeros(N, dtype) a['a'] = np.random.random_integers(0,3, N) a['b'] = np.array([x for x in string.ascii_lowercase[:N]]) a['c'] = np.random.normal(size=(N,)) # Write to a binary file a.tofile('test.dat') # Read data into new array b = np.fromfile('test.dat', dtype=dtype)

The arrays a and b are identical (i.e np.all(a['a'] == b['a']) is True):

for col in a.dtype.names: print col, np.all(a[col] == b[col]) # Prints: # a True # b True # c True

Update:

If you have header information, you can first open the file, seek to the starting point of the data and then read. For example:

f = open("test.dat", "rb") f.seek(header_size) b = np.fromfile(f, dtype=dtype) f.close()

You have to know the size (header_size), but then you should be good. If there are subsections, you can supply a count of the number of items to grab. I haven't tested if the counts works. If you are not bound to this binary format, I would recommend using something like hdf5 to store multiple arrays in a single file.

更多推荐

本文发布于:2023-08-05 12:27:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1432705.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数组   结构化   二进制文件   numpy   Reading

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!