在 pandas 框架的不同列中计算混淆矩阵?

编程入门行业动态更新时间:2024-10-25 20:20:24

本文介绍了在 pandas 框架的不同列中计算混淆矩阵?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我有一个包含3000行和3列的数据框，如下所示:

I have a dataframe with 3000 rows and 3 columns as follows:

0 col1 col2 col3 ID1 1 0 1 Id2 1 1 0 Id3 0 1 1 Id4 2 1 0 Id5 2 2 3 … .. .. .. Id3000 3 1 0

在此数据帧中，每一列和每一行的值表示预测问题的结果，如下所示:每列中0表示TP，1表示FP，2表示TN，3指向FN.所以我想计算每列的准确性.像这样的东西:

In this data frame, the value of each column and row refers to a result of a prediction problem as follows: 0 means TP, 1 means FP, 2 refers to TN and 3 points to FN in each column. So I want to calculate the accuracy of each column. something like this:

Accuracy result: col1 col2 col3 0.67 0.68 0.79

任何想法都可以以非常有效的方式计算出重要的指标，例如准确性或f-measure.

Any idea that I can calculate the important metrics, like accuracy or f-measure in a very efficient way.

推荐答案

这是一种方法:

data = """ id col1 col2 col3 ID1 1 0 1 Id2 1 1 0 Id3 0 1 1 Id4 2 1 0 Id5 2 2 3 """ #coding to create a sample DataFrame for testing df = pd.read_csv(pdpat.StringIO(data), sep='\s+') print(df) #end of creation accuracy ={} #dict for result final # i select all columns with name begins by 'col' and create a list select_cols = [col for col in df.columns if col.startswith('col')] for col in select_cols: df1 = df.groupby(col).size() t = [0,0,0,0] #[TP, FP, TN, FN] 0 = TP, 1 = FP, 2 = TN and 3 = FN for v in df1.index: t[v] = df1[v] accuracy[col] = (t[0] + t[2])/(sum(t)) #Accuracy = (TP + TN)/(TP +TN + FP + FN df_acc = pd.DataFrame.from_dict(accuracy, orient='index').T print('Accuracy:');print(df_acc)

输出:

Accuracy: col1 col2 col3 0 0.6 0.4 0.4

或者另一个解决方案(我认为更好):您替换了2个循环for

Or another solution (better i think): you replace the 2 loops for

for col in select_cols: accuracy[col] = (df[df[col]==0].count()[0] + df[df[col]==2].count()[0]) / df[col].count() df_acc = pd.DataFrame.from_dict(accuracy, orient='index' ).T.reset_index(drop=True) print('Accuracy');print(df_acc)