随机播放DataFrame行

编程入门 行业动态 更新时间:2024-10-28 04:29:11
本文介绍了随机播放DataFrame行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我有以下DataFrame:

I have the following DataFrame:

Col1 Col2 Col3 Type 0 1 2 3 1 1 4 5 6 1 ... 20 7 8 9 2 21 10 11 12 2 ... 45 13 14 15 3 46 16 17 18 3 ...

从csv文件读取DataFrame.所有具有Type 1的行都位于最上面,然后是具有Type 2的行,然后是具有Type 3的行,等等.

The DataFrame is read from a csv file. All rows which have Type 1 are on top, followed by the rows with Type 2, followed by the rows with Type 3, etc.

我想重新排列DataFrame行的顺序,以便混合所有Type.可能的结果可能是:

I would like to shuffle the order of the DataFrame's rows, so that all Type's are mixed. A possible result could be:

Col1 Col2 Col3 Type 0 7 8 9 2 1 13 14 15 3 ... 20 1 2 3 1 21 10 11 12 2 ... 45 4 5 6 1 46 16 17 18 3 ...

我该如何实现?

推荐答案

使用Pandas的惯用方法是使用 .sample 方法可对所有行进行采样而无需替换:

The idiomatic way to do this with Pandas is to use the .sample method of your dataframe to sample all rows without replacement:

df.sample(frac=1)

frac关键字参数指定随机样本中要返回的行的分数,因此frac=1表示返回所有行(以随机顺序).

The frac keyword argument specifies the fraction of rows to return in the random sample, so frac=1 means return all rows (in random order).

注意: 如果您希望就地改组数据框并重置索引,则可以执行例如

Note: If you wish to shuffle your dataframe in-place and reset the index, you could do e.g.

df = df.sample(frac=1).reset_index(drop=True)

在这里,指定drop=True可以防止.reset_index创建包含旧索引条目的列.

Here, specifying drop=True prevents .reset_index from creating a column containing the old index entries.

后续说明:尽管上面的操作看起来不是就地 ,但python/pandas足够聪明,因此无需为改组后的代码做另外的malloc目的.也就是说,即使 reference 对象已更改(我的意思是id(df_old)与id(df_new)不同),底层的C对象仍然相同.为了证明确实如此,您可以运行一个简单的内存分析器:

Follow-up note: Although it may not look like the above operation is in-place, python/pandas is smart enough not to do another malloc for the shuffled object. That is, even though the reference object has changed (by which I mean id(df_old) is not the same as id(df_new)), the underlying C object is still the same. To show that this is indeed the case, you could run a simple memory profiler:

$ python3 -m memory_profiler .\test.py Filename: .\test.py Line # Mem usage Increment Line Contents ================================================ 5 68.5 MiB 68.5 MiB @profile 6 def shuffle(): 7 847.8 MiB 779.3 MiB df = pd.DataFrame(np.random.randn(100, 1000000)) 8 847.9 MiB 0.1 MiB df = df.sample(frac=1).reset_index(drop=True)

更多推荐

随机播放DataFrame行

本文发布于:2023-10-28 08:13:12,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1536053.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:DataFrame

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!