Pandas DataFrame插入/填充以前日期的缺失行

编程入门 行业动态 更新时间:2024-10-21 18:41:58
本文介绍了Pandas DataFrame插入/填充以前日期的缺失行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个 DataFrame ,它由 date s,其他列和一个数值组成,其中一些值组合在其他列可能会丢失,我想从以前的日期开始填充它们。

I have a DataFrame consisting of dates, other columns and a numerical value, where some value combinations in "other columns" could be missing, and I want to populate them from previous dates.

示例。假设 DataFrame 如下所示。您可以在 2016-01-01 上看到,我们有(LN,A),(LN,B),(NY,A)和(NY,B)在(位置,范围)列上。

Example. Say the DataFrame is like below. You can see on 2016-01-01, we have data for (LN, A), (LN, B), (NY, A) and (NY, B) on columns (location, band).

date location band value 0 2016-01-01 LN A 10.0 1 2016-01-01 LN B 5.0 2 2016-01-01 NY A 9.0 3 2016-01-01 NY B 6.0 4 2016-01-02 LN A 11.0 5 2016-01-02 NY B 7.0 6 2016-01-03 NY A 10.0

然后您注意到 2016-01-02 ,我们只有(LN,A)和(NY,B),但(LN,B)和(NY,A)丢失。同样,在 2016-01-03 上,仅(纽约州,美国)可用;所有其他三个组合都丢失了。

Then you notice on 2016-01-02, we only have (LN, A) and (NY, B), but (LN, B) and (NY, A) are missing. Again, on 2016-01-03, only (NY, A) is available; all other three combinations are missing.

我要做的是填充其前身中每个日期的丢失组合。比如说 2016-01-02 ,我想再添加两行,从 2016-01-01 :(LN,B,5.0)和(NY,A,9.0) $ c>(位置,范围,价值)。 2016-01-03 也是如此。为了使整个事情像下面这样:

What I want to do is to populate the missing combinations of each date from its predecessor. Say for 2016-01-02, I would like to add two more rows, "rolled over" from 2016-01-01: (LN, B, 5.0) and (NY, A, 9.0) for columns (location, band, value). Same for 2016-01-03. So as to make the whole thing like below:

date location band value 0 2016-01-01 LN A 10.0 1 2016-01-01 LN B 5.0 2 2016-01-01 NY A 9.0 3 2016-01-01 NY B 6.0 4 2016-01-02 LN A 11.0 5 2016-01-02 NY B 7.0 6 2016-01-03 NY A 10.0 7 2016-01-02 LN B 5.0 8 2016-01-02 NY A 9.0 9 2016-01-03 LN A 11.0 10 2016-01-03 LN B 5.0 11 2016-01-03 NY B 7.0

注意第7-11行分别从第1、2、4、7和5行填充。顺序不是很重要,因为如果我需要的所有数据都存在,我总是可以在事后进行排序。

Note rows 7-11 are populated from rows 1, 2, 4, 7 and 5, respectively. The order is not really important as I can always sort afterwards if all the data I need is present.

有人需要帮助吗?非常感谢!

Anyone to help? Thanks a lot!

推荐答案

您可以使用 unstack / stack 方法获取所有缺失值,然后进行正向填充:

You can use a unstack/stack method to get all missing values, followed by a forward fill:

# Use unstack/stack to add missing locations. df = df.set_index(['date', 'location', 'band']) \ .unstack(level=['location', 'band']) \ .stack(level=['location', 'band'], dropna=False) # Forward fill NaN values within ['location', 'band'] groups. df = df.groupby(level=['location', 'band']).ffill().reset_index()

或者您可以直接构建包含所有组合的 MultiIndex :

Or you can directly build a MultiIndex containing all combinations:

# Build the full MultiIndex, set the partial MultiIndex, and reindex. levels = ['date', 'location', 'band'] full_idx = pd.MultiIndex.from_product([df[col].unique() for col in levels], names=levels) df = df.set_index(levels).reindex(full_idx) # Forward fill NaN values within ['location', 'band'] groups. df = df.groupby(level=['location', 'band']).ffill().reset_index()

任一方法的结果输出:

date location band value 0 2016-01-01 LN A 10.0 1 2016-01-01 LN B 5.0 2 2016-01-01 NY A 9.0 3 2016-01-01 NY B 6.0 4 2016-01-02 LN A 11.0 5 2016-01-02 LN B 5.0 6 2016-01-02 NY A 9.0 7 2016-01-02 NY B 7.0 8 2016-01-03 LN A 11.0 9 2016-01-03 LN B 5.0 10 2016-01-03 NY A 10.0 11 2016-01-03 NY B 7.0

更多推荐

Pandas DataFrame插入/填充以前日期的缺失行

本文发布于:2023-10-28 19:26:37,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1537565.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:缺失   日期   Pandas   DataFrame

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!