请考虑以下数据框:
A B E 0 bar one 1 1 bar three 1 2 flux six 1 3 flux three 2 4 foo five 2 5 foo one 1 6 foo two 1 7 foo two 2我想为A的每个值找到其他列中唯一值的数量.
I would like to find, for each value of A, the number of unique values in the other columns.
我认为以下可以做到:
I thought the following would do it: df.groupby('A').apply(lambda x: x.nunique())
但是我得到一个错误:
AttributeError: 'DataFrame' object has no attribute 'nunique'
我也尝试过:
I also tried with:
df.groupby('A').nunique()但是我也得到了错误:
AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'
最后我尝试了:
Finally I tried with:
df.groupby('A').apply(lambda x: x.apply(lambda y: y.nunique()))返回:
A B E A bar 1 2 1 flux 1 2 2 foo 1 3 2,似乎是正确的.但是奇怪的是,它也在结果中返回列A.为什么?
and seems to be correct. Strangely though, it also returns the column A in the result. Why?
推荐答案
DataFrame对象没有nunique,只有Series有.您必须选择要在nunique()上应用的列.您可以使用简单的点运算符来做到这一点:
The DataFrame object doesn't have nunique, only Series do. You have to pick out which column you want to apply nunique() on. You can do this with a simple dot operator:
df.groupby('A').apply(lambda x: x.B.nunique())将打印:
A bar 2 flux 2 foo 3并且正在做
df.groupby('A').apply(lambda x: x.E.nunique())将打印:
A bar 1 flux 2 foo 2或者,您可以使用以下方法通过一个函数调用来完成此操作:
Alternatively you can do this with one function call using:
df.groupby('A').aggregate({'B': lambda x: x.nunique(), 'E': lambda x: x.nunique()})将打印:
B E A bar 2 1 flux 2 2 foo 3 2要回答有关为什么递归lambda还要打印A列的问题,这是因为当您执行groupby/apply操作时,现在要遍历三个DataFrame对象.每个DataFrame对象都是原始对象的子DataFrame.对它应用操作将把它应用于每个Series.您将nunique()运算符应用于的每个DataFrame有三个Series.
To answer your question about why your recursive lambda prints the A column as well, it's because when you do a groupby/apply operation, you're now iterating through three DataFrame objects. Each DataFrame object is a sub-DataFrame of the original. Applying an operation to that will apply it to each Series. There are three Series per DataFrame you're applying the nunique() operator to.
在每个DataFrame上被评估的第一个Series是A Series,并且由于您已经在A上进行了groupby,因此您知道在每个DataFrame中都有A Series中只有一个唯一值.这就解释了为什么最终会为您提供带有所有1的A结果列.
The first Series being evaluated on each DataFrame is the A Series, and since you've done a groupby on A, you know that in each DataFrame, there is only one unique value in the A Series. This explains why you're ultimately given an A result column with all 1's.
更多推荐
每组每列的唯一值数量
发布评论