Iva*_*sky 7
您可以连接explode()
,然后将表转回所需的输出!
df = df.explode('cNames').explode('cValues')
df['cValues'] = pd.to_numeric(df['cValues'])
print(df.pivot_table(columns='cNames',index='number',values='cValues'))
输出:
cNames a b c d
number
10 2.0 2.0 2.0 NaN
20 66.0 66.0 NaN 66.0
遗憾的是,explode 的输出是类型,object
因此我们必须先将其转换为 ,pd.to_numeric()
然后再进行旋转。否则将没有要聚合的数值。
Qua*_*ang 7
一种选择是concat
:
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx)
for idx, x in df.iterrows()],
axis=1
).T.join(df.iloc[:,2:])
或 DataFrame 构造:
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
for idx, x in df.iterrows()
}).T.join(df.iloc[:,2:])
输出:
a b c d number
0 1.0 2.0 3.0 NaN 10
1 55.0 66.0 NaN 77.0 20
更新性能按样本数据的运行时间排序
数据框
%%timeit
pd.DataFrame({idx: dict(zip(x['cNames'], x['cValues']) )
for idx, x in df.iterrows()
}).T.join(df.iloc[:,2:])
1.29 ms ± 36.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
连接:
%%timeit
pd.concat([pd.Series(x['cValues'], x['cNames'], name=idx)
for idx, x in df.iterrows()],
axis=1
).T.join(df.iloc[:,2:])
2.03 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
KJDII的新系列
%%timeit
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
2.09 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
斯科特的申请(pd.Series.explode)
%%timeit
df.apply(pd.Series.explode)\
.set_index(['number', 'cNames'], append=True)['cValues']\
.unstack()\
.reset_index()\
.drop('level_0', axis=1)
4.9 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
wwnde 的 set_index.apply(explode)
%%timeit
g=df.set_index('number').apply(lambda x: x.explode()).reset_index()
g['cValues']=g['cValues'].astype(int)
pd.pivot_table(g, index=["number"],values=["cValues"],columns=["cNames"]).droplevel(0, axis=1).reset_index()
7.27 ms ± 162 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Celius的双爆
%%timeit
df1 = df.explode('cNames').explode('cValues')
df1['cValues'] = pd.to_numeric(df1['cValues'])
df1.pivot_table(columns='cNames',index='number',values='cValues')
9.42 ms ± 189 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
KJD*_*DII 5
import pandas as pd
d = {'cNames': [['a','b','c'], ['a','b','d']], 'cValues': [[1,2,3],
[55,66,77]], 'number': [10,20]}
df = pd.DataFrame(data=d)
df['series'] = df.apply(lambda x: dict(zip(x['cNames'], x['cValues'])), axis=1)
df = pd.concat([df['number'], df['series'].apply(pd.Series)], axis=1)
print(df)
number a b c d
0 10 1.0 2.0 3.0 NaN
1 20 55.0 66.0 NaN 77.0
如果列顺序很重要:
columns = ['a', 'b', 'c', 'd', 'number']
df = df[columns]
a b c d number
0 1.0 2.0 3.0 NaN 10
1 55.0 66.0 NaN 77.0 20
更多推荐
数据,列表
发布评论