我似乎从来没有numpy的阵列来对我很好地工作。 (
I can never seem to get NumPy arrays to work nicely for me. :(
我的数据很简单:150行4彩车后面跟着一个字符串。我试过如下:
My dataset is simple: 150 rows of 4 floats followed by one string. I tried the following:
data = np.genfromtxt("iris.data2", delimiter=",", names=["SL", "SW", "PL", "PW", "class"], dtype=[float, float, float, float, '|S16']) print(data.shape) ---> (150, 0) print(data["PL"]) print(data[:, 0:3]) <---error所以,我做一个简单的文件替换改变了它只有5浮动。我只能这样做,因为我无法获得非均匀阵列既列名和索引访问很好地工作。但现在,我已均质,它仍然给我回的形状(150,0)和一个错误。
So I changed it just 5 floats by doing a simple file replace. I only do this because I couldn't get the non-homogenous array to work nicely with both column name and index accessing. But now that I have made it homogenous, it still gives me back a shape of (150, 0) and an error.
data = np.genfromtxt("iris.data", delimiter=",", names=["SL", "SW", "PL", "PW", "class"]) print(data.shape) ---> (150, 0) print(data["PL"]) print(data[:, 0:3]) <--- error当我完全删除名称,它为索引列的存取权限,但显然不是名字了。
When I remove the names entirely, it works for index-column acces, but obviously not names anymore.
data = np.genfromtxt("iris.data", delimiter=",") print(data.shape) ---> (150, 5) # print(data["PL"]) print(data[:, 0:3]) ---> WORKS GREAT!!!这是为什么?如何解决?理想情况下,我想没有一个引脚悬空code替换字符串既名称和索引列访问,但如果我需要为了得到名称和索引列访问我会做到这一点。
Why is this and how do I fix it? Ideally I would like both name and index column access without replacing the string with a float-code, but I will do it if I need to in order to get name and index column access.
推荐答案有一个一维数组结构化的领域,二维数组的列之间有明显的区别。它们是不可互换。字段名是不是简单的列标签。如果说不清楚你很多需要阅读 DTYPE 或结构阵列文档的更多细节。
There's a clear distinction between the fields of a 1d structured array, and the columns of a 2d array. They aren't interchangeable. Field names aren't simply column labels. If that isn't clear you many need to read the dtype or structured array docs in more detail.
定义一个伪文件:
In [93]: txt=b"""1,2,3,4,txt ....: 5,6,7,8,abc""" In [94]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None) Out[94]: array([(1, 2, 3, 4, 'txt'), (5, 6, 7, 8, 'abc')], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', 'S3')])通过混合列的默认方式加载它是一个结构数组,2行(形状=(2,)),以及5个字段,索引为数据['F0'] 或数据['F0','F2'] 。能力指数几个领域一次是有限的。
With mixed columns the default way to load it is a structured array, with 2 rows (shape=(2,)), and 5 fields, indexed as data['f0'] or data[['f0','f2']]. The ability to index several fields at once is limited.
但是,我们可以定义一个复合DTYPE,如:
But we can define a compound dtype, such as:
In [102]: dt=np.dtype([('data',float,(4,)),('lbl','|S5')]) In [103]: dt Out[103]: dtype([('data', '<f8', (4,)), ('lbl', 'S5')]) In [104]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) Out[104]: array([([1.0, 2.0, 3.0, 4.0], 'txt'), ([5.0, 6.0, 7.0, 8.0], 'abc')], dtype=[('data', '<f8', (4,)), ('lbl', 'S5')]) In [105]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) In [106]: data['data'] Out[106]: array([[ 1., 2., 3., 4.], [ 5., 6., 7., 8.]]) In [107]: data['lbl'] Out[107]: array(['txt', 'abc'], dtype='|S5') In [108]: data[0] Out[108]: ([1.0, 2.0, 3.0, 4.0], 'txt')现在数据['数据'] 是一个二维数组,从原来的文本包含数值。
Now data['data'] is a 2d array, containing the numeric values from the original text.
字段名称可以牵强,因为一个元组:
The field names can be fetched as a tuple:
In [112]: data.dtype.names Out[112]: ('data', 'lbl')这样就可以对它们执行通常的列表/元组索引,甚至做一些令人费解的观看顺序相反的字段:
so it is possible to perform usual list/tuple indexing on them, and even do something a convoluted as viewing the fields in reverse order:
In [115]: data[list(data.dtype.names[::-1])] Out[115]: array([('txt', [1.0, 2.0, 3.0, 4.0]), ('abc', [5.0, 6.0, 7.0, 8.0])], dtype=[('lbl', 'S5'), ('data', '<f8', (4,))])更多推荐
numpy的结构数组名称和指标
发布评论