快速Python的numpy功能？

编程入门行业动态更新时间:2024-10-28 05:19:22

本文介绍了快速Python的numpy功能？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我在几个中为循环使用了numpy的where函数，但是它太慢了。有什么方法可以更快地执行此功能？我读过你应该尝试在循环中进行循环，以及为循环之前的函数创建函数的局部变量，但似乎没有任何东西提高速度（< 1％）。 len（UNIQ_IDS）〜800. emiss_data 和 obj_data 是形状=（2600,5200）的numpy ndarrays。我已经使用 import profile 来获取瓶颈的位置，其中位于循环是一个很大的。

import numpy as np max = np .max where = np.where MAX_EMISS = [max（emiss_data [where（obj_data == i）]）for UNIQ_IDS）]

请考虑以下方法：

import numpy作为np 导入集合导入itertools作为IT 形状=（2600,5200）形状=（26,52） emiss_data = np .random.random（shape） obj_data = np.random.random_integers（1，800，size = shape） UNIQ_IDS = np.unique（obj_data） $ b $ def using_where ）： max = np.max where = np.where MAX_EMISS = [max（emiss_data [where（obj_data == i）]）for UNIQ_IDS] return MAX_EMISS 在UNIQ_IDS中，使用using_index（）： max = np.max MAX_EMISS = [max（emiss_data [obj_data == i] return MAX_EMISS $ b $ def using_max（）： MAX_EMISS = [（emiss_data [obj_data == i]）。max（）for UNIQ_IDS] return MAX_EMISS （）， $ b def using_loop（）： result = collections.defaultdict（list） for val，idx in IT.izip（emiss_data.ravel（），obj_data.ravel（））：结果[idx] .append（val）返回[max（result [idx]）for idx in UNIQ_IDS] def using_sort（）： uind = np.digitize（obj_data.ravel（），UNIQ_IDS） - 1 vals = uind.argsort（） count = np.bincount（uind） start = 0 end = 0 out = np.empty（count.shape [0]） for ind，x in np.ndenumerate（count）： end + = x out [ind ] = np.max（np.take（emiss_data，vals [start：end]）） start + = x 返回 $ b $ def using_split（）： ui nd = np.digitize（obj_data.ravel（），UNIQ_IDS） - 1 vals = uind.argsort（） count = np.bincount（uind） return [np.take（emiss_data ，item）.max（） for np.split（vals，count.cumsum（））[： - 1]] for func in（using_index，using_max，using_loop， using_sort，using_split）： assert using_where（）== func（）
这里是基准，与 shape =（2600,5200）：
In [ 57]：％timeit using_loop（） 1个循环，最好是3：每个循环9.15 s 在[90]中：％timeit using_sort（） 1个循环， 3：每循环9.33秒在[91]中：％timeit using_split（） 1循环，最好是3：每循环9.33秒在[61 ]：％timeit using_index（） 1个循环，最好是3：每循环63.2 s 在[62]中：％timeit using_max（） 1个循环，最好是3 ：每循环64.4 s 在[58]中：％timeit using_where（） 1循环，最好3：每循环112 s / pre>
星期四s using_loop （纯Python）的结果比 using_where 快11倍以上。 b
我不完全确定为什么纯Python比NumPy快。我的猜测是，纯粹的Python版本通过两个阵列拉链（是的，双关语意图）。它利用了这样的事实：尽管所有的花式索引，我们真的只想访问每个值一次。因此，它不得不确定哪个组合中的每个值都属于哪个组。但是，这只是模糊的猜测。我不知道它会更快，直到我基准。

I am using numpy's where function many times inside several for loops, but it becomes way too slow. Are there any ways to perform this functionality faster? I read you should try to do in-line for loops, as well as make local variables for functions before the for loops, but nothing seems to improve speed by much (< 1%). The len(UNIQ_IDS) ~ 800. emiss_data and obj_data are numpy ndarrays with shape = (2600,5200). I've used import profile to get a handle on where the bottlenecks are, and where in for loops is a big one.
import numpy as np max = np.max where = np.where MAX_EMISS = [max(emiss_data[where(obj_data == i)]) for i in UNIQ_IDS)]

解决方案
It turns out that a pure Python loop can be much much faster than NumPy indexing (or calls to np.where) in this case.

Consider the following alternatives:
import numpy as np import collections import itertools as IT shape = (2600,5200) # shape = (26,52) emiss_data = np.random.random(shape) obj_data = np.random.random_integers(1, 800, size=shape) UNIQ_IDS = np.unique(obj_data) def using_where(): max = np.max where = np.where MAX_EMISS = [max(emiss_data[where(obj_data == i)]) for i in UNIQ_IDS] return MAX_EMISS def using_index(): max = np.max MAX_EMISS = [max(emiss_data[obj_data == i]) for i in UNIQ_IDS] return MAX_EMISS def using_max(): MAX_EMISS = [(emiss_data[obj_data == i]).max() for i in UNIQ_IDS] return MAX_EMISS def using_loop(): result = collections.defaultdict(list) for val, idx in IT.izip(emiss_data.ravel(), obj_data.ravel()): result[idx].append(val) return [max(result[idx]) for idx in UNIQ_IDS] def using_sort(): uind = np.digitize(obj_data.ravel(), UNIQ_IDS) - 1 vals = uind.argsort() count = np.bincount(uind) start = 0 end = 0 out = np.empty(count.shape[0]) for ind, x in np.ndenumerate(count): end += x out[ind] = np.max(np.take(emiss_data, vals[start:end])) start += x return out def using_split(): uind = np.digitize(obj_data.ravel(), UNIQ_IDS) - 1 vals = uind.argsort() count = np.bincount(uind) return [np.take(emiss_data, item).max() for item in np.split(vals, count.cumsum())[:-1]] for func in (using_index, using_max, using_loop, using_sort, using_split): assert using_where() == func()
Here are the benchmarks, with shape = (2600,5200):
In [57]: %timeit using_loop() 1 loops, best of 3: 9.15 s per loop In [90]: %timeit using_sort() 1 loops, best of 3: 9.33 s per loop In [91]: %timeit using_split() 1 loops, best of 3: 9.33 s per loop In [61]: %timeit using_index() 1 loops, best of 3: 63.2 s per loop In [62]: %timeit using_max() 1 loops, best of 3: 64.4 s per loop In [58]: %timeit using_where() 1 loops, best of 3: 112 s per loop
Thus using_loop (pure Python) turns out to be more than 11x faster than using_where.

I'm not entirely sure why pure Python is faster than NumPy here. My guess is that the pure Python version zips (yes, pun intended) through both arrays once. It leverages the fact that despite all the fancy indexing, we really just want to visit each value once. Thus it side-steps the issue with having to determine exactly which group each value in emiss_data falls in. But this is just vague speculation. I didn't know it would be faster until I benchmarked.

更多推荐

快速Python的numpy功能？

本文发布于:2023-11-26 04:37:58，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1632651.html

版权声明:本站内容均来自互联网，仅供演示用，请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系，我们将在24小时内删除。

快速功能 Python numpy

上一篇：快速延迟功能

下一篇：快速覆盖功能错误

发布评论取消回复

评论列表（有 0 条评论）

最近发表

荆门网站建设的重要性

win10蓝屏终止代码CRITICAL_PROCESS_DIED解决方法

您可以尝试添加 --skip-broken 选项来解决该问题您可以尝试执行：rpm -Va --nofiles --nodigest 解决方案

关于无线网络波动大的解决办法

Windows10 关于系统中断CPU占用过高导致电脑变卡的解决办法

VS 2019 点击页面自动定位到解决方案资源管理器目录位置

（亲测解决）VMware打开需要半天才进入、打开系统很慢、运行很慢解决办法

Typora官网下载的最新版本mac10.13以下版本用不了的解决办法

成功解决ModuleNotFoundError: No module named ‘torch._C‘

MySQL:由于找不到VCRUNTIME140_1.dll，无法继续执行代码。重新安装程序可能会解决此问题

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍！

热门文章

从源“http://localhost:5173”访问“...”处的 XMLHttpRequest 已被 CORS 策略阻止

币安API错误代码1102，未发送强制参数“时间戳”

如果我在bot telegram nodejs中使用editMessageMedia，我如何制作标题

在 Node.js 中从网络流创建 blob

使用 Node.js / ES6 如何设置 dotenv 文件的自定义路径？

使用 NODE.JS 和 html5 实现低延迟（50 毫秒）视频流

如何从nodejs连接laravel>laravel

使用nodejs观看目录

如果文件包含特定字符串，如何跳过 GitHub 工作流程步骤？

FirebaseError：无法从.env加载环境变量

标签列表

文件

如何在

Python

系统

java

方法

数据

错误

windows

函数

android

linux

教程

如何使用

代码

字符串

计算机

电脑

服务器

NET

应用程序

数组

PHP

MySQL

SQL

对象

项目

程序

数据库

word