Aks*_*gal 5
这是我针对您提到的特定用例的解决方案-
这些辅助函数的代码
categorical_repeat
,continous_interpolate
和other
在下面的解释 > 方法部分中提供。
config = {'year':categorical_repeat, #shortest repeating sequence
'cat1':continous_interpolate, #curve fitting (linear)
'cat2':other} #forward fill
print(df.agg(config))
year cat1 cat2
0 2019.0 1 c1
1 2020.0 2 c1
2 2019.0 3 c1
3 2020.0 4 c2
4 2019.0 5 c2
5 2020.0 6 c2
解释:
据我了解,没有像 excel 那样直接处理 pandas 中所有类型的模式的方法。Excel 涉及连续序列的线性插值,但它涉及其他列模式的其他方法。
连续整数数组 -> 线性插值 重复循环 -> 最小重复序列 字母(和类似的)-> 平铺固定序列直到 df 的长度 无法识别的图案 -> 前向填充这是我尝试使用我的方法的虚拟数据集 -
data = {'A': [2019, 2020, 2019, 2020, 2019, 2020],
'B': [1, 2, 3, 4, 5, 6],
'C': [6, 5, 4, 3, 2, 1],
'D': ['C', 'D', 'E', 'F', 'G', 'H'],
'E': ['A', 'B', 'C', 'A', 'B', 'C'],
'F': [1,2,3,3,4,2]
}
df = pd.DataFrame(data)
empty = pd.DataFrame(columns=df.columns, index=df.index)[:4]
df_new = df.append(empty).reset_index(drop=True)
print(df_new)
A B C D E F
0 2019 1 6 C A 1
1 2020 2 5 D B 2
2 2019 3 4 E C 3
3 2020 4 3 F A 3
4 2019 5 2 G B 4
5 2020 6 1 H C 2
6 NaN NaN NaN NaN NaN NaN
7 NaN NaN NaN NaN NaN NaN
8 NaN NaN NaN NaN NaN NaN
9 NaN NaN NaN NaN NaN NaN
方法:
让我们从一些辅助函数开始 -
import numpy as np
import scipy as sp
import pandas as pd
#Curve fitting (linear)
def f(x, m, c):
return m*x+c #Modify to extrapolate for exponential sequences etc.
#Interpolate continous linear
def continous_interpolate(s):
clean = s.dropna()
popt, pcov = sp.optimize.curve_fit(f, clean.index, clean)
output = [round(i) for i in f(s.index, *popt)] #Remove the round() for float values
return pd.Series(output)
#Smallest Repeating sub-sequence
def pattern(inputv):
'''
/sf/ask/421489211/
'''
pattern_end =0
for j in range(pattern_end+1,len(inputv)):
pattern_dex = j%(pattern_end+1)
if(inputv[pattern_dex] != inputv[j]):
pattern_end = j;
continue
if(j == len(inputv)-1):
return inputv[0:pattern_end+1];
return inputv;
#Categorical repeat imputation
def categorical_repeat(s):
clean = s.dropna()
cycle = pattern(clean)
repetitions = (len(s)//len(cycle))+1
output = np.tile(cycle, repetitions)[:len(s)]
return pd.Series(output)
#continous sequence of alphabets
def alphabet(s):
alp = 'abcdefghijklmnopqrstuvwxyz'
alp2 = alp*((len(s)//len(alp))+1)
start = s[0]
idx = alp2.find(start.lower())
output = alp2[idx:idx+len(s)]
if start.isupper():
output = output.upper()
return pd.Series(list(output))
#If no pattern then just ffill
def other(s):
return s.ffill()
接下来,让我们根据我们想要解决的问题创建配置并应用所需的方法 -
config = {'A':categorical_repeat,
'B':continous_interpolate,
'C':continous_interpolate,
'D':alphabet,
'E':categorical_repeat,
'F':other}
output_df = df_new.agg(config)
print(output_df)
A B C D E F
0 2019 1 6 C A 1
1 2020 2 5 D B 2
2 2019 3 4 E C 3
3 2020 4 3 F A 3
4 2019 5 2 G B 4
5 2020 6 1 H C 2
6 2019 7 0 I A 2
7 2020 8 -1 J B 2
8 2019 9 -2 K C 2
9 2020 10 -3 L A 2
更多推荐
熊猫,拖动,模式,数据,excel
发布评论