计算pandas中数字和非数字列的每日平均值(Compute daily averages of numeric and non-numeric columns in pandas)
我有一个每小时时间索引的数据框:
wind_direction relative_humidity dates 2017-07-18 19:00:00 W 88 2017-07-18 20:00:00 N 88 2017-07-18 21:00:00 W 90 2017-07-18 22:00:00 S 91 2017-07-18 23:00:00 W 93如何计算每日平均值,使得对于数字列,我们计算每日平均值,对于非数字列,我们输出出现次数最多的值。
- 编辑:
我这样做了:
df = df.resample('D').mean()但是这会返回错误
I have a dataframe with hourly time index:
wind_direction relative_humidity dates 2017-07-18 19:00:00 W 88 2017-07-18 20:00:00 N 88 2017-07-18 21:00:00 W 90 2017-07-18 22:00:00 S 91 2017-07-18 23:00:00 W 93How can I compute daily average such that for numeric columns we compute daily mean and for non-numeric columns we output the value which occurs most number of times.
-- EDIT:
I did this:
df = df.resample('D').mean()However this returns an error
最满意答案
选项1
from cytoolz.dicttoolz import merge ncols = df.select_dtypes([np.number]).columns ocols = df.columns.difference(ncols) df.index = pd.to_datetime(df.index) d = merge( {c: 'mean' for c in ncols}, {c: lambda x: pd.value_counts(x).index[0] for c in ocols} ) df.resample('D').agg(d) relative_humidity wind_direction dates 2017-07-18 90 W 选项2
df.index = pd.to_datetime(df.index) g = df.resample('D') g.mean().combine_first(g.agg(lambda x: pd.value_counts(x).index[0]))[df.columns] relative_humidity wind_direction dates 2017-07-18 90 WOption 1
from cytoolz.dicttoolz import merge ncols = df.select_dtypes([np.number]).columns ocols = df.columns.difference(ncols) df.index = pd.to_datetime(df.index) d = merge( {c: 'mean' for c in ncols}, {c: lambda x: pd.value_counts(x).index[0] for c in ocols} ) df.resample('D').agg(d) relative_humidity wind_direction dates 2017-07-18 90 W Option 2
df.index = pd.to_datetime(df.index) g = df.resample('D') g.mean().combine_first(g.agg(lambda x: pd.value_counts(x).index[0]))[df.columns] relative_humidity wind_direction dates 2017-07-18 90 W更多推荐
发布评论