我正在使用以下方法读取二进制文件
numpy.fromfile(file, dtype=)二进制文件有多种类型,我知道组织。 因此我定义了一个dtype数组如下:
dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1), ('d','i4',1),('e','S1',8)]这个dtype数组说二进制文件的第一个值是一个整数后跟8个字符等...
我遇到的问题是二进制文件不是dtypearr的大小。 二进制文件具有在dtypearr中重复n次定义的结构。
到目前为止,我所做的是用新字段名重复dtypearr,直到它与二进制文件大小相同。
但是,我希望不知何故,我可以在不重复dtypearr的情况下实现这一目标。 相反,我希望在每个字段中存储一个数组。 例如,我想结构化阵列['a']或structuredarray ['b']给我一个数组而不是单个值。
编辑
注意:
numpy.fromfile(file, dtype=dtypearr)当模式完全相同时,实现我想要的。 以下解决方案也有效。
但是,我提到的二进制文件中的模式并不完全重复。 例如,有一个标题部分和多个子部分。 每个小节都有自己的重复模式。 f.seek()将适用于最后一小节,但不适用于之前的小节。
I am reading a binary file using the following method
numpy.fromfile(file, dtype=)The binary file has multiple types present and i know the organization. Therefore I have defined a dtype array as follows:
dtypearr = [('a','i4',1),('b','S1',8),('c','i4',1), ('d','i4',1),('e','S1',8)]This dtype array is saying that the first value of the binary file is one integer followed by 8 characters etc...
The problem i am having is that the binary file is not the size of dtypearr. The binary file has the structure defined in dtypearr repeating n times.
So far, what i have done is repeat the dtypearr with new field names until it is the same size as the binary file.
However, i was hoping that somehow, I could achieve this goal without repeating dtypearr. Instead I want an array to be stored in each field. For example, i want structuredarray['a'] or structuredarray['b'] to give me an array instead of a single value.
Edit
Note that:
numpy.fromfile(file, dtype=dtypearr)Achieves what i want when the pattern is exactly the same. The solution below also works.
However, the pattern in the binary file i mentioned isn't exactly repeating. For example, there is a header portion and multiple subsections. And each subsection has its own repeating pattern. f.seek() will work for the last subsection, but not the subsections before.
最满意答案
尝试:
import numpy as np import string # Create some fake data N = 10 dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')]) a = np.zeros(N, dtype) a['a'] = np.random.random_integers(0,3, N) a['b'] = np.array([x for x in string.ascii_lowercase[:N]]) a['c'] = np.random.normal(size=(N,)) # Write to a binary file a.tofile('test.dat') # Read data into new array b = np.fromfile('test.dat', dtype=dtype)数组a和b是相同的(即np.all(a['a'] == b['a']) is True ):
for col in a.dtype.names: print col, np.all(a[col] == b[col]) # Prints: # a True # b True # c True更新:
如果您有标题信息,可以先打开文件,查找数据的起点然后再读取。 例如:
f = open("test.dat", "rb") f.seek(header_size) b = np.fromfile(f, dtype=dtype) f.close()你必须知道大小( header_size ),但你应该是好的。 如果有子部分,您可以提供要抓取的项目数。 我没有测试计数是否有效。 如果你不受这种二进制格式的约束,我建议使用像hdf5这样的东西在一个文件中存储多个数组。
Try:
import numpy as np import string # Create some fake data N = 10 dtype = np.dtype([('a', 'i4'), ('b', 'S8'), ('c', 'f8')]) a = np.zeros(N, dtype) a['a'] = np.random.random_integers(0,3, N) a['b'] = np.array([x for x in string.ascii_lowercase[:N]]) a['c'] = np.random.normal(size=(N,)) # Write to a binary file a.tofile('test.dat') # Read data into new array b = np.fromfile('test.dat', dtype=dtype)The arrays a and b are identical (i.e np.all(a['a'] == b['a']) is True):
for col in a.dtype.names: print col, np.all(a[col] == b[col]) # Prints: # a True # b True # c TrueUpdate:
If you have header information, you can first open the file, seek to the starting point of the data and then read. For example:
f = open("test.dat", "rb") f.seek(header_size) b = np.fromfile(f, dtype=dtype) f.close()You have to know the size (header_size), but then you should be good. If there are subsections, you can supply a count of the number of items to grab. I haven't tested if the counts works. If you are not bound to this binary format, I would recommend using something like hdf5 to store multiple arrays in a single file.
更多推荐
发布评论