pandas :快速计算具有特定值的列之和

编程入门 行业动态 更新时间:2024-10-27 21:15:35
本文介绍了 pandas :快速计算具有特定值的列之和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个pandas数据框,我需要计算落入某个窗口内的一列值的总和.因此,例如,如果我有一个500的窗口,而我的初始值为1000,那么我想对所有介于499和999之间以及1001和1501之间的值求和.

I have a pandas dataframe and I need to calculate the sum of a column of values that fall within a certain window. So for instance, if I have a window of 500, and my initial value is 1000, I want to sum all values that are between 499 and 999, and also between 1001 and 1501.

用一些数据更容易解释:

This is easier to explain with some data:

chrom pos end AFR EUR pi 0 1 10177 10177 0.4909 0.4056 0.495988 1 1 10352 10352 0.4788 0.4264 0.496369 2 1 10617 10617 0.9894 0.9940 0.017083 3 1 11008 11008 0.1346 0.0885 0.203142 4 1 11012 11012 0.1346 0.0885 0.203142 5 1 13110 13110 0.0053 0.0567 0.053532 6 1 13116 13116 0.0295 0.1869 0.176091 7 1 13118 13118 0.0295 0.1869 0.176091 8 1 13273 13273 0.0204 0.1471 0.139066 9 1 13550 13550 0.0008 0.0080 0.007795 10 1 14464 14464 0.0144 0.1859 0.161422 11 1 14599 14599 0.1210 0.1610 0.238427 12 1 14604 14604 0.1210 0.1610 0.238427 13 1 14930 14930 0.4811 0.5209 0.500209 14 1 14933 14933 0.0015 0.0507 0.044505 15 1 15211 15211 0.5371 0.7316 0.470848 16 1 15585 15585 0.0008 0.0020 0.002635 17 1 15644 15644 0.0008 0.0080 0.007795 18 1 15777 15777 0.0159 0.0149 0.030470 19 1 15820 15820 0.4849 0.2714 0.477153 20 1 15903 15903 0.0431 0.4652 0.349452 21 1 16071 16071 0.0091 0.0010 0.011142 22 1 16142 16142 0.0053 0.0020 0.007721 23 1 16949 16949 0.0227 0.0159 0.038759 24 1 18643 18643 0.0023 0.0080 0.009485 25 1 18849 18849 0.8411 0.9911 0.170532 26 2 30923 30923 0.6687 0.9364 0.338400 27 2 20286 46286 0.0053 0.0010 0.006863 28 2 21698 46698 0.0015 0.0010 0.002566 29 2 42159 47159 0.0083 0.0696 0.067187

所以我需要基于前两列进行子集化.例如,如果我的窗口= 500,我的chrom = 1,我的pos = 15500,我将需要对我的df进行子集化,以仅包含chrom = 1且15000> pos< 16000.

So I need to subset based on the first two columns. For example, if my window = 500, my chrom = 1 and my pos = 15500, I will need to subset my df to include only those rows that have chrom = 1 and 15000 > pos < 16000.

然后我想对这部分数据的AFR列求和.

I would then like to sum the AFR column of this subset of data.

这是我做的功能:

#vdf is my main dataframe, #polyChrom is the chromosome to subset by, #polyPos is the position to subset by. #Distance is how far the window should be from the polyPos. #windowSize is the size of the window itself #E.g. if distance=20000 and windowSize= 500, we are looking at a window #that is (polyPos-20000)-500 to (polyPos-20000) and a window that is #(polyPos+20000) to (polyPos+20000)+500. def mafWindow(vdf, polyChrom, polyPos, distance, windowSize): #If start position becomes less than 0, set it to 0 if(polyPos - distance < 0): start1 = 0 end1 = windowSize else: start1 = polyPos - distance end1 = start1 + windowSize end2 = polyPos + distance start2 = end2 - windowSize #subset df df = vdf.loc[(vdf['chrom'] == polyChrom) & ((vdf['pos'] <= end1) & (vdf['pos'] >= start1))| ((vdf['pos'] <= end2) & (vdf['pos'] >= start2))].copy() return(df.AFR.sum())

整个方法可用于对数据框进行子集设置,当我的数据框包含约55k行时,该方法非常慢.有更快,更有效的方法吗?

This whole method works on subsetting the dataframe and is very slow when my dataframe contains ~55k rows. Is there a quicker and more efficient way of doing this?

推荐答案

诀窍是将其放到numpy数组中.熊猫索引编制和切片速度很慢.

The trick is to drop down to numpy arrays. Pandas indexing and slicing is slow.

import pandas as pd df = pd.DataFrame([[1, 10177, 0.5], [1, 10178, 0.2], [1, 20178, 0.1], [2, 10180, 0.3], [1, 10180, 0.4]], columns=['chrom', 'pos', 'AFR']) chrom = df['chrom'].values pos = df['pos'].values afr = df['AFR'].values def filter_sum(chrom_arr, pos_arr, afr_arr, chrom_val, pos_start, pos_end): return sum(k for i, j, k in zip(chrom_arr, pos_arr, afr_arr) \ if pos_start < j < pos_end and i == chrom_val) filter_sum(chrom, pos, afr, 1, 10150, 10200) # 1.1

更多推荐

pandas :快速计算具有特定值的列之和

本文发布于:2023-10-26 08:00:45,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1529568.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:之和   定值   快速   pandas

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!