按常用名称划分数据框行对(Dividing pairs of data frame rows by common names)

编程入门行业动态更新时间:2024-10-14 08:26:56

我有一个数据框，看起来像这样：

1 2 a_value 2 8 a_ref 4 2 b_value 6 10 b_ref 3 15 c_value 7 3

请注意，某些索引是name_value和name_ref对，而其他索引则不是

我想找到这些对，并为每一对在我的新数据name_value获得四行： name_value ， name_ref ， name_ref/name_value ， name_value/name_ref所以我的输出数据name_value/name_ref如下所示：

1 2 a_value 2.0 8.000 a_ref 4.0 2.000 a_value/a_ref 0.5 4.000 a_ref/a_value 2.0 0.250 b_value 6.0 10.000 b_ref 3.0 15.000 b_value/b_ref 2.0 0.666 b_ref/b_value 0.5 1.500

我现在通过遍历索引来寻找那些以value结尾的索引，然后试图找到匹配的ref ，但是知道大熊猫，似乎应该有一个更简单的方法，或许以某种方式使用groupby。那么..在那里？

I have a dataframe which looks like that:

1 2 a_value 2 8 a_ref 4 2 b_value 6 10 b_ref 3 15 c_value 7 3

note that some indices are pairs of name_value and name_ref and others are not

I want to find those pairs, and for each pair get four rows in my new dataframe: name_value, name_ref, name_ref/name_value, name_value/name_ref so my output dataframe looks like this:

1 2 a_value 2.0 8.000 a_ref 4.0 2.000 a_value/a_ref 0.5 4.000 a_ref/a_value 2.0 0.250 b_value 6.0 10.000 b_ref 3.0 15.000 b_value/b_ref 2.0 0.666 b_ref/b_value 0.5 1.500

I currently do it by iterating over the indices looking for ones that end with value and then trying to find the matching ref, but knowing pandas, it seems that there should be an easier way, maybe using groupby somehow. So.. is there?

最满意答案

这可能不是最优雅的解决方案，但它的工作原理。首先，让我们找到常用的键：

import numpy as np keys = np.intersect1d(df.index.str.extract("(.+)_value").dropna(), df.index.str.extract("(.+)_ref").dropna()) #array(['a', 'b'], dtype=object)

接下来，选择匹配的参考和值：

refs = df.loc[keys + "_ref"] values = df.loc[keys +"_value"]

制作每个数据帧的副本并将它们的键指定为索引：

values1 = values.copy() values1.index = keys refs1 = refs.copy() refs1.index = keys

再次执行除法和更新索引：

ratios = values1 / refs1 ratios.index += "_value" + "/" + ratios.index + "_ref" ratios1 = refs1 / values1 ratios1.index += "_ref" + "/" + ratios1.index + "_value"

把所有东西放在一起并进

pd.concat([refs, values, ratios, ratios1]).sort_index() # 1 2 #a_ref 4.0 2.000000 #a_ref/a_value 2.0 0.250000 #a_value 2.0 8.000000 #a_value/a_ref 0.5 4.000000 #b_ref 3.0 15.000000 #b_ref/b_value 0.5 1.500000 #b_value 6.0 10.000000 #b_value/b_ref 2.0 0.666667

This may not be the most elegant solution, but it works. First, lets find the common keys:

import numpy as np keys = np.intersect1d(df.index.str.extract("(.+)_value").dropna(), df.index.str.extract("(.+)_ref").dropna()) #array(['a', 'b'], dtype=object)

Next, select the matching refs and values:

refs = df.loc[keys + "_ref"] values = df.loc[keys +"_value"]

Make a copy of each dataframe and assign them the keys as indexes:

values1 = values.copy() values1.index = keys refs1 = refs.copy() refs1.index = keys

Perform the division and update the indexes once again:

ratios = values1 / refs1 ratios.index += "_value" + "/" + ratios.index + "_ref" ratios1 = refs1 / values1 ratios1.index += "_ref" + "/" + ratios1.index + "_value"

Put everything together and sort:

更多推荐

本文发布于:2023-08-06 21:47:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1456566.html