快速分类(装箱)

编程入门 行业动态 更新时间:2024-10-28 02:33:46
本文介绍了快速分类(装箱)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有很多条目,每个条目都是一个浮点数.这些数据x可通过迭代器访问.我需要使用选择10<y<=20,20<y<=50,....对所有条目进行分类,其中y是来自其他可迭代对象的数据.条目的数量远远大于选择的数量.最后,我想要一个像这样的字典:

I've a huge number of entries, every one is a float number. These data x are accesible with an iterator. I need to classify all the entries using selection like 10<y<=20, 20<y<=50, .... where y are data from an other iterables. The number of entries is much more than the number of selections. At the end I want a dictionary like:

{ 0: [all events with 10<x<=20], 1: [all events with 20<x<=50], ... }

或类似的东西.例如,我在做:

or something similar. For example I'm doing:

for x, y in itertools.izip(variable_values, binning_values): thebin = binner_function(y) self.data[tuple(thebin)].append(x)

y通常是多维的.

这非常慢,有没有更快的解决方案,例如使用numpy?我认为问题出在我使用的list.append方法而不是binner_function

This is very slow, is there a faster solution, for example with numpy? I think the problem cames from the list.append method I'm using and not from the binner_function

推荐答案

在numpy中获取分配的一种快速方法是使用np.digitize:

A fast way to get the assignments in numpy is using np.digitize:

docs.scipy/doc/numpy/reference/generated/numpy.digitize.html

您仍然必须将结果分配分成几组.如果x或y是多维的,则必须首先将数组展平.然后,您可以获取唯一的bin分配,然后与np.where一起遍历这些分配,以将分配分为几组.如果bin的数量比需要合并的元素的数量小得多,这可能会更快.

You'd still have to split the resulting assignments up into groups. If x or y is multidimensional, you will have to flatten the arrays first. You could then get the unique bin assignments, and then iterate over those in conjunction with np.where to split the the assigments up into groups. This will probably be faster if the number of bins is much smaller than the number of elements that need to be binned.

作为一个微不足道的示例,您将需要针对特定​​问题进行调整/详细说明(但希望足以使您开始使用numpy解决方案):

As a somewhat trivial example that you will need to tweak/elaborate on for your particular problem (but is hopefully enough to get you started with with a numpy solution):

In [1]: import numpy as np In [2]: x = np.random.normal(size=(50,)) In [3]: b = np.linspace(-20,20,50) In [4]: assign = np.digitize(x,b) In [5]: assign Out[5]: array([23, 25, 25, 25, 24, 26, 24, 26, 23, 24, 25, 23, 26, 25, 27, 25, 25, 25, 25, 26, 26, 25, 25, 26, 24, 23, 25, 26, 26, 24, 24, 26, 27, 24, 25, 24, 23, 23, 26, 25, 24, 25, 25, 27, 26, 25, 27, 26, 26, 24]) In [6]: uid = np.unique(assign) In [7]: adict = {} In [8]: for ii in uid: ...: adict[ii] = np.where(assign == ii)[0] ...: In [9]: adict Out[9]: {23: array([ 0, 8, 11, 25, 36, 37]), 24: array([ 4, 6, 9, 24, 29, 30, 33, 35, 40, 49]), 25: array([ 1, 2, 3, 10, 13, 15, 16, 17, 18, 21, 22, 26, 34, 39, 41, 42, 45]), 26: array([ 5, 7, 12, 19, 20, 23, 27, 28, 31, 38, 44, 47, 48]), 27: array([14, 32, 43, 46])}

要处理展平然后取消展平的numpy数组,请参见: docs.scipy/doc/numpy/reference /generation/numpy.unravel_index.html

For dealing with flattening and then unflattening numpy arrays, see: docs.scipy/doc/numpy/reference/generated/numpy.unravel_index.html

docs.scipy/doc/numpy/reference/生成/numpy.ravel_multi_index.html

更多推荐

快速分类(装箱)

本文发布于:2023-11-30 12:25:37,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1649941.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:快速

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!