在numpy数组中按最大或最小分组

编程入门行业动态更新时间:2024-10-27 04:28:11

本文介绍了在numpy数组中按最大或最小分组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有两个等长的一维numpy数组id和data，其中id是重复的有序整数序列，这些整数定义了data上的子窗口.例如，

I have two equal-length 1D numpy arrays, id and data, where id is a sequence of repeating, ordered integers that define sub-windows on data. For example,

id data 1 2 1 7 1 3 2 8 2 9 2 10 3 1 3 -10

我想通过对id进行分组并采用最大值或最小值来汇总data.在SQL中，这将是典型的聚合查询，例如SELECT MAX(data) FROM tablename GROUP BY id ORDER BY id.有没有一种方法可以避免Python循环并以矢量化方式执行此操作，还是必须降到C?

I would like to aggregate data by grouping on id and taking either the max or the min. In SQL, this would be a typical aggregation query like SELECT MAX(data) FROM tablename GROUP BY id ORDER BY id. Is there a way I can avoid Python loops and do this in a vectorized manner, or do I have to drop down to C?

推荐答案

最近几天，我一直在堆栈上看到一些非常相似的问题.以下代码与numpy.unique的实现非常相似，并且由于它利用了底层的numpy机制，因此它很可能会比在python循环中可以执行的任何操作都要快.

I've been seeing some very similar questions on stack overflow the last few days. The following code is very similar to the implementation of numpy.unique and because it takes advantage of the underlying numpy machinery, it is most likely going to be faster than anything you can do in a python loop.

import numpy as np def group_min(groups, data): # sort with major key groups, minor key data order = np.lexsort((data, groups)) groups = groups[order] # this is only needed if groups is unsorted data = data[order] # construct an index which marks borders between groups index = np.empty(len(groups), 'bool') index[0] = True index[1:] = groups[1:] != groups[:-1] return data[index] #max is very similar def group_max(groups, data): order = np.lexsort((data, groups)) groups = groups[order] #this is only needed if groups is unsorted data = data[order] index = np.empty(len(groups), 'bool') index[-1] = True index[:-1] = groups[1:] != groups[:-1] return data[index]

更多推荐

在numpy数组中按最大或最小分组

本文发布于:2023-11-22 00:52:06，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1615331.html