Python Pandas:根据规则从df中选择Unique(Python Pandas: Select Unique from df based on Rule)

编程入门 行业动态 更新时间:2024-10-26 22:27:38
Python Pandas:根据规则从df中选择Unique(Python Pandas: Select Unique from df based on Rule)

我有一个pandas数据帧(最初从一个sql查询生成),看起来像:

index AccountId ItemID EntryDate 1 100 1000 1/1/2016 2 100 1000 1/2/2016 3 100 1000 1/3/2016 4 101 1234 9/15/2016 5 101 1234 9/16/2016 etc....

我希望将其缩小到一个唯一的列表,只返回具有最早可用日期的条目,如下所示:

index AccountId ItemID EntryDate 1 100 1000 1/1/2016 4 101 1234 9/15/2016 etc....

一个相当新的熊猫开发的任何指针或方向? 独特的功能似乎无法处理这些类型的规则,并且循环遍历数组并确定哪一个丢弃对于一个简单的任务来说似乎很麻烦......是否有一个函数我是不知道这样做了吗?

I have a pandas dataframe (originally generated from a sql query) that looks like:

index AccountId ItemID EntryDate 1 100 1000 1/1/2016 2 100 1000 1/2/2016 3 100 1000 1/3/2016 4 101 1234 9/15/2016 5 101 1234 9/16/2016 etc....

I'd like to get this whittled down to a unique list, returning only the entry with the earliest date available, something like this:

index AccountId ItemID EntryDate 1 100 1000 1/1/2016 4 101 1234 9/15/2016 etc....

Any pointers or direction for a fairly new pandas dev? The unique function doesn't appear to be able to handle these types of rules, and looping through the array and working out which one to drop seems like a lot of trouble for a simple task... Is there a function that I'm missing that does this?

最满意答案

让我们使用groupby , idxmin和.loc :

df_out = df2.loc[df2.groupby('AccountId')['EntryDate'].idxmin()] print(df_out)

输出:

AccountId ItemID EntryDate index 1 100 1000 2016-01-01 4 101 1234 2016-09-15

Let's use groupby, idxmin, and .loc:

df_out = df2.loc[df2.groupby('AccountId')['EntryDate'].idxmin()] print(df_out)

Output:

AccountId ItemID EntryDate index 1 100 1000 2016-01-01 4 101 1234 2016-09-15

更多推荐

本文发布于:2023-07-29 05:08:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1312621.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:规则   df   Python   Pandas   based

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!