numpy的结构数组名称和指标

编程入门行业动态更新时间:2024-10-22 09:51:10

本文介绍了numpy的结构数组名称和指标的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我似乎从来没有numpy的阵列来对我很好地工作。（

I can never seem to get NumPy arrays to work nicely for me. :(

我的数据很简单：150行4彩车后面跟着一个字符串。我试过如下：

My dataset is simple: 150 rows of 4 floats followed by one string. I tried the following:

data = np.genfromtxt("iris.data2", delimiter=",", names=["SL", "SW", "PL", "PW", "class"], dtype=[float, float, float, float, '|S16']) print(data.shape) ---> (150, 0) print(data["PL"]) print(data[:, 0:3]) <---error

所以，我做一个简单的文件替换改变了它只有5浮动。我只能这样做，因为我无法获得非均匀阵列既列名和索引访问很好地工作。但现在，我已均质，它仍然给我回的形状（150，0）和一个错误。

So I changed it just 5 floats by doing a simple file replace. I only do this because I couldn't get the non-homogenous array to work nicely with both column name and index accessing. But now that I have made it homogenous, it still gives me back a shape of (150, 0) and an error.

data = np.genfromtxt("iris.data", delimiter=",", names=["SL", "SW", "PL", "PW", "class"]) print(data.shape) ---> (150, 0) print(data["PL"]) print(data[:, 0:3]) <--- error

当我完全删除名称，它为索引列的存取权限，但显然不是名字了。

When I remove the names entirely, it works for index-column acces, but obviously not names anymore.

data = np.genfromtxt("iris.data", delimiter=",") print(data.shape) ---> (150, 5) # print(data["PL"]) print(data[:, 0:3]) ---> WORKS GREAT!!!

这是为什么？如何解决？理想情况下，我想没有一个引脚悬空code替换字符串既名称和索引列访问，但如果我需要为了得到名称和索引列访问我会做到这一点。

Why is this and how do I fix it? Ideally I would like both name and index column access without replacing the string with a float-code, but I will do it if I need to in order to get name and index column access.

推荐答案

有一个一维数组结构化的领域，二维数组的列之间有明显的区别。它们是不可互换。字段名是不是简单的列标签。如果说不清楚你很多需要阅读 DTYPE 或结构阵列文档的更多细节。

There's a clear distinction between the fields of a 1d structured array, and the columns of a 2d array. They aren't interchangeable. Field names aren't simply column labels. If that isn't clear you many need to read the dtype or structured array docs in more detail.

定义一个伪文件：

In [93]: txt=b"""1,2,3,4,txt ....: 5,6,7,8,abc""" In [94]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=None) Out[94]: array([(1, 2, 3, 4, 'txt'), (5, 6, 7, 8, 'abc')], dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', 'S3')])

通过混合列的默认方式加载它是一个结构数组，2行（形状=（2，）），以及5个字段，索引为数据['F0'] 或数据['F0'，'F2'] 。能力指数几个领域一次是有限的。

With mixed columns the default way to load it is a structured array, with 2 rows (shape=(2,)), and 5 fields, indexed as data['f0'] or data[['f0','f2']]. The ability to index several fields at once is limited.

但是，我们可以定义一个复合DTYPE，如：

But we can define a compound dtype, such as:

In [102]: dt=np.dtype([('data',float,(4,)),('lbl','|S5')]) In [103]: dt Out[103]: dtype([('data', '<f8', (4,)), ('lbl', 'S5')]) In [104]: np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) Out[104]: array([([1.0, 2.0, 3.0, 4.0], 'txt'), ([5.0, 6.0, 7.0, 8.0], 'abc')], dtype=[('data', '<f8', (4,)), ('lbl', 'S5')]) In [105]: data=np.genfromtxt(txt.splitlines(),delimiter=',',dtype=dt) In [106]: data['data'] Out[106]: array([[ 1., 2., 3., 4.], [ 5., 6., 7., 8.]]) In [107]: data['lbl'] Out[107]: array(['txt', 'abc'], dtype='|S5') In [108]: data[0] Out[108]: ([1.0, 2.0, 3.0, 4.0], 'txt')

现在数据['数据'] 是一个二维数组，从原来的文本包含数值。

Now data['data'] is a 2d array, containing the numeric values from the original text.

字段名称可以牵强，因为一个元组：

The field names can be fetched as a tuple:

In [112]: data.dtype.names Out[112]: ('data', 'lbl')

这样就可以对它们执行通常的列表/元组索引，甚至做一些令人费解的观看顺序相反的字段：

so it is possible to perform usual list/tuple indexing on them, and even do something a convoluted as viewing the fields in reverse order:

In [115]: data[list(data.dtype.names[::-1])] Out[115]: array([('txt', [1.0, 2.0, 3.0, 4.0]), ('abc', [5.0, 6.0, 7.0, 8.0])], dtype=[('lbl', 'S5'), ('data', '<f8', (4,))])

更多推荐

numpy的结构数组名称和指标

本文发布于:2023-07-28 02:37:51，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1226523.html