计算 pandas 出现次数的最有效方法是什么?

编程入门 行业动态 更新时间:2024-10-27 20:31:24
本文介绍了计算 pandas 出现次数的最有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有一个大的(大约 1200 万行)数据帧 df 说:

I have a large (about 12M rows) dataframe df with say:

df.columns = ['word','documents','frequency']

所以以下内容及时运行:

So the following ran in a timely fashion:

word_grouping = df[['word','frequency']].groupby('word') MaxFrequency_perWord = word_grouping[['frequency']].max().reset_index() MaxFrequency_perWord.columns = ['word','MaxFrequency']

但是,这需要很长时间才能运行:

However, this is taking an unexpected long time to run:

Occurrences_of_Words = word_grouping[['word']].count().reset_index()

我在这里做错了什么?有没有更好的方法来计算大型数据帧中的出现次数?

What am I doing wrong here? Is there a better way to count occurences in a large dataframe?

df.word.describe()

运行得很好,所以我真的没想到这个 Occurrences_of_Words 数据框需要很长时间来构建.

ran pretty well, so I really did not expect this Occurrences_of_Words dataframe to take very long to build.

ps:如果答案是显而易见的,并且你觉得有必要惩罚我提出这个问题,请附上答案.谢谢.

ps: If the answer is obvious and you feel the need to penalize me for asking this question, please include the answer as well. thank you.

推荐答案

我认为 df['word'].value_counts() 应该服务.通过跳过 groupby 机制,您将节省一些时间.我不确定为什么 count 应该比 max 慢得多.两者都需要一些时间来避免缺失值.(对比size.)

I think df['word'].value_counts() should serve. By skipping the groupby machinery, you'll save some time. I'm not sure why count should be much slower than max. Both take some time to avoid missing values. (Compare with size.)

无论如何,value_counts 已经专门优化来处理对象类型,比如你的话,所以我怀疑你会做得更好.

In any case, value_counts has been specifically optimized to handle object type, like your words, so I doubt you'll do much better than that.

更多推荐

计算 pandas 出现次数的最有效方法是什么?

本文发布于:2023-10-25 00:22:20,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1525406.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:最有效   次数   方法   pandas

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!