熊猫如何在不寻常的文本顺序中进行分解(Pandas how to Factorize in Unusual Text Order)

系统教程 行业动态 更新时间:2024-06-14 16:57:18
熊猫如何在不寻常的文本顺序中进行分解(Pandas how to Factorize in Unusual Text Order)

我有一个数据框,其列'cat100'具有如下值:

'A''B'......'Y''Z''AA''AB'......

我想使用pd.factorize对列进行分解,使AA在'B''C'...'Z'之后。

我尝试过类似的东西:

df = pd.DataFrame(['A','B','AA']) df[0] = pd.factorize(df[0], sort=True)[0]

但这会将A分配给0,B分配给2,AA分配给1.我希望AA分配给2,B分配给1。

我一直在寻找方法来做到这一点并没有找到任何东西。 有没有办法做到这一点?

I have a dataframe that has a column 'cat100' that has values like the following:

'A' 'B' ... 'Y' 'Z' 'AA' 'AB' ...

I would like to factorize the column using pd.factorize such that AA is after 'B' 'C' ... 'Z'.

I've tried something like:

df = pd.DataFrame(['A','B','AA']) df[0] = pd.factorize(df[0], sort=True)[0]

But this assigns A to 0, B to 2, and AA to 1. I want AA to be assigned to 2 and B to 1.

I've searched for ways to do this and haven't found anything. Is there a way to do this?

最满意答案

考虑带有字符串列的DF ,如下所示:

df = pd.DataFrame(dict(col=['A','B','AA','C','BB','AAA','BC','AB','AA'])) df

在此处输入图像描述

自定义功能:

(i)从正在考虑的专栏中获取唯一条目。 (ii)按字符串长度Groupby ,并按字典顺序对它们进行排序并水平堆叠。 (iii)将它们分解。

def complex_factorize(df, col): ser = pd.Series(df[col].unique()) func = lambda x: sorted(x.values.ravel()) arr = np.hstack(ser.groupby(ser.str.len()).apply(func).values) return pd.factorize(arr)

获取factorize方法返回的标签和系列的唯一元素,将其提供给DF.replace以构建映射。

val, ser = complex_factorize(df, 'col') df.replace(ser, val)

在此处输入图像描述

Consider a DF with a string column as shown:

df = pd.DataFrame(dict(col=['A','B','AA','C','BB','AAA','BC','AB','AA'])) df

enter image description here

Custom Function:

(i) Take unique entries from the column under consideration. (ii) Groupby by string lengths and sort these lexicographically and stack them horizontally. (iii) Factorize them.

def complex_factorize(df, col): ser = pd.Series(df[col].unique()) func = lambda x: sorted(x.values.ravel()) arr = np.hstack(ser.groupby(ser.str.len()).apply(func).values) return pd.factorize(arr)

Taking the labels and the unique elements of the series returned by the factorize method, feed it to DF.replace to construct the mapping.

val, ser = complex_factorize(df, 'col') df.replace(ser, val)

enter image description here

更多推荐

本文发布于:2023-04-12 20:56:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/dzcp/61489c73517b41dd148cc720ee4b723d.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:熊猫   分解   寻常   顺序   文本

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!