我有一个多级数据框df.作为列,我有不同的对象" 我分析.作为行索引,我有一个案例ID lc和时间t.
I have a multilevel dataframe df. As columns, I have different "objects" I analyze. As rows index , I have a Case ID lc, and time t.
对于每种情况lc,我都需要找到时间t(理想情况是内插,但是 最接近的值就足够了),每个对象都达到目标值.
I need to find, for each case lc, the time t (ideally interpolated, but closest value is fine enough) at which each object reached a target value.
此目标值是给定对象在时间t==0的函数.
This target value is a function of the given object at time t==0.
import pandas as pd print(pd.__version__) 0.16.2虚拟数据集示例:
data = {1: {(1014, 0.0): 20.25, (1014, 0.0991): 19.08, (1014, 0.1991): 18.43, (1014, 0.2991): 19.03, (1014, 0.3991): 18.71, (1015, 0.0): 20.22, (1015, 0.0991): 19.3, (1015, 0.1991): 18.68, (1015, 0.2991): 18.22, (1015, 0.3991): 17.84, (1016, 0.0): 21.75, (1016, 0.0991): 19.97, (1016, 0.1991): 19.65, (1016, 0.2991): 19.29, (1016, 0.3991): 18.94 }, 2: {(1014, 0.0): 29.11, (1014, 0.0991): 28.68, (1014, 0.1991): 28.27, (1014, 0.2991): 27.46, (1014, 0.3991): 26.96, (1015, 0.0): 29.22, (1015, 0.0991): 28.64, (1015, 0.1991): 28.18, (1015, 0.2991): 27.74, (1015, 0.3991): 27.25, (1016, 0.0): 29.17, (1016, 0.0991): 28.68, (1016, 0.1991): 28.17, (1016, 0.2991): 27.68, (1016, 0.3991): 27.18 }, 3: {(1014, 0.0): 22.01, (1014, 0.0991): 21.5, (1014, 0.1991): 21.18, (1014, 0.2991): 20.58, (1014, 0.3991): 20.21, (1015, 0.0): 21.81, (1015, 0.0991): 21.46, (1015, 0.1991): 21.11, (1015, 0.2991): 20.78, (1015, 0.3991): 20.42, (1016, 0.0): 21.82, (1016, 0.0991): 21.49, (1016, 0.1991): 21.11, (1016, 0.2991): 20.75, (1016, 0.3991): 20.37 }} df = pd.DataFrame(data).sort() df.index.names=['case', 't']数据框看起来像这样:
1 2 3 case t 1014 0.0000 20.25 29.11 22.01 0.0991 19.08 28.68 21.50 0.1991 18.43 28.27 21.18 0.2991 19.03 27.46 20.58 0.3991 18.71 26.96 20.21 1015 0.0000 20.22 29.22 21.81 0.0991 19.30 28.64 21.46 0.1991 18.68 28.18 21.11 0.2991 18.22 27.74 20.78 0.3991 17.84 27.25 20.42 1016 0.0000 21.75 29.17 21.82 0.0991 19.97 28.68 21.49 0.1991 19.65 28.17 21.11 0.2991 19.29 27.68 20.75 0.3991 18.94 27.18 20.37目标值是时间t==0的值的函数. 通常,对于半时段,这将是k = 0.5.对于当前样本,我们将取k = 0.926
Target values are a function of the values at time t==0. typically, this would be k=0.5 for half-time period. For the current sample,we will take k=0.926
由于对值进行了排序,因此可以针对每种情况采用第一行.
Since values are sorted, it is ok to take the first lines for each case.
targets = df.groupby(level='case').first() * 0.926 print(targets) 1 2 3 case 1014 18.75150 26.95586 20.38126 1015 18.72372 27.05772 20.19606 1016 20.14050 27.01142 20.20532现在,我该如何简单地构建以下数据框,该数据框显示了 每个对象达到上述计算的目标值的时间t?
Now, How could I simply build the following dataframe, which shows time t at wich each object reach target value calculated above?
1 2 3 case 1014 0.3991 0.3991 0.2991 1015 0.1991 0.3991 0.3991 1016 0.0991 0.3991 0.3991推荐答案
这些有点像骇客,让我们看看是否有更好的解决方案:
These are somewhat of a hack, let's see if there are better solutions:
In [36]: targets['t']=0 In [37]: df2 = df.reset_index().set_index('case') - targets In [38]: df3 = df2.groupby(df2.index).transform(lambda x: x.abs()==np.min(x.abs())) In [39]: df4 = pd.DataFrame({'1': df2.t[df3[1]], '2': df2.t[df3[2]], '3': df2.t[df3[3]]}) print df4 1 2 3 case 1014 0.3991 0.3991 0.3991 1015 0.1991 0.3991 0.3991 1016 0.0991 0.3991 0.3991更多推荐
在 pandas 数据框中查找目标值
发布评论