我有一个pandas数据帧(最初从一个sql查询生成),看起来像:
index AccountId ItemID EntryDate 1 100 1000 1/1/2016 2 100 1000 1/2/2016 3 100 1000 1/3/2016 4 101 1234 9/15/2016 5 101 1234 9/16/2016 etc....我希望将其缩小到一个唯一的列表,只返回具有最早可用日期的条目,如下所示:
index AccountId ItemID EntryDate 1 100 1000 1/1/2016 4 101 1234 9/15/2016 etc....一个相当新的熊猫开发的任何指针或方向? 独特的功能似乎无法处理这些类型的规则,并且循环遍历数组并确定哪一个丢弃对于一个简单的任务来说似乎很麻烦......是否有一个函数我是不知道这样做了吗?
I have a pandas dataframe (originally generated from a sql query) that looks like:
index AccountId ItemID EntryDate 1 100 1000 1/1/2016 2 100 1000 1/2/2016 3 100 1000 1/3/2016 4 101 1234 9/15/2016 5 101 1234 9/16/2016 etc....I'd like to get this whittled down to a unique list, returning only the entry with the earliest date available, something like this:
index AccountId ItemID EntryDate 1 100 1000 1/1/2016 4 101 1234 9/15/2016 etc....Any pointers or direction for a fairly new pandas dev? The unique function doesn't appear to be able to handle these types of rules, and looping through the array and working out which one to drop seems like a lot of trouble for a simple task... Is there a function that I'm missing that does this?
最满意答案
让我们使用groupby , idxmin和.loc :
df_out = df2.loc[df2.groupby('AccountId')['EntryDate'].idxmin()] print(df_out)输出:
AccountId ItemID EntryDate index 1 100 1000 2016-01-01 4 101 1234 2016-09-15Let's use groupby, idxmin, and .loc:
df_out = df2.loc[df2.groupby('AccountId')['EntryDate'].idxmin()] print(df_out)Output:
AccountId ItemID EntryDate index 1 100 1000 2016-01-01 4 101 1234 2016-09-15更多推荐
发布评论