Python与机器学习5

编程入门行业动态更新时间:2024-10-06 16:26:20

Python与<a href=https://www.elefans.com/category/jswz/34/1771242.html style= 机器学习5"/>

Python与机器学习5

1.准确率，精确率和召回率
解析：假设有60个正样本，40个负样本，要找出所有的正样本，系统查找出50个，其中只有40个是真正的正样本。那么TP（将正类预测为正类数）为40，FN（漏报，将正类预测为负类数）为20，FP（误报，将负类预测为正类数）为10，TN（将负类预测为负类数）为30。如下所示：
[1]准确率[accuracy]= T P + T N T P + F N + F P + T N \frac{{{\rm{TP}} + {\rm{TN}}}}{{{\rm{TP}} + {\rm{FN}} + {\rm{FP}} + {\rm{TN}}}} TP+FN+FP+TNTP+TN = 70%
[2]精确率[precision] = T P T P + F P \frac{{{\rm{TP}}}}{{{\rm{TP}} + {\rm{FP}}}} TP+FPTP= 80%
[3]召回率[recall] = T P T P + F N \frac{{{\rm{TP}}}}{{{\rm{TP}} + {\rm{FN}}}} TP+FNTP= 2/3

2.医学阳性和阴性
解析：
[1]阳性：有病。
[2]阴性：正常。

3. F 1 F_{1} F1值
解析： F 1 = 2 T P 2 T P + F P + F N {F_{\rm{1}}}{\rm{ = }}\frac{{2TP}}{{2TP + FP + FN}} F1=2TP+FP+FN2TP

4.查准率和查全率
解析：在信息检索领域，精确率和召回率又被称为查准率和查全率，如下所示：
[1]查准率＝检索出的相关信息量/检索出的信息总量
[2]查全率＝检索出的相关信息量/系统中的相关信息总量

5.TPR，FPR和FRR
解析：
[1]TPR[True Positive Rate]：TP/(TP+FN)，即Recall。
[2]FAR[False Acceptance Rate]或FPR[False Positive Rate]：FP/(FP+TN)，即错误接收率[误报率]，在所有负样本中有多少被识别为正样本。
[3]FRR[False Rejection Rate]：FN/(FN+TP)，即错误拒绝率[拒真率]，在所有正样本中有多少被识别为负样本，它等于1-Recall。

6.ROC和AUC
解析：
[1]在ROC[Receiver Operating Characteristic]中，每点的横坐标是FPR，纵坐标是TPR。
[2]AUC[Area Under Curve]被定义为ROC曲线下的面积。

7.MAE和MSE
解析：
[1]平均绝对误差MAE[Mean Absolute Error]，即 M A E ( y , y ^ ) = 1 N ∑ i = 1 N ∣ y i − y ^ i ∣ {\rm{MAE}}\left( {y,\hat y} \right) = \frac{1}{N}\sum\limits_{i = 1}^N {\left| {{y_i} - {{\hat y}_i}} \right|} MAE(y,y^)=N1i=1∑N∣yi−y^i∣。
[2]平均平方误差MSE[Mean Squared Error]，即 M S E ( y , y ^ ) = 1 N ∑ i = 1 N ( y i − y ^ i ) 2 {\rm{MSE}}\left( {y,\hat y} \right) = \frac{1}{N}\sum\limits_{i = 1}^N {{{\left( {{y_i} - {{\hat y}_i}} \right)}^2}} MSE(y,y^)=N1i=1∑N(yi−y^i)2。

8.sklearn分类标准
解析：
[1]metrics.accuracy_score(y_true, y_pred[, …])：Accuracy classification score.
[2]metrics.auc(x, y[, reorder])：Compute Area Under the Curve [AUC] using the trapezoidal rule.
[3]metrics.average_precision_score(y_true, y_score)：Compute average precision [AP] from prediction scores.
[4]metrics.classification_report(y_true, y_pred)：Build a text report showing the main classification metrics.
[5]metrics.confusion_matrix(y_true, y_pred[, …])：Compute confusion matrix to evaluate the accuracy of a classification.
[6]metrics.f1_score(y_true, y_pred[, labels, …])：Compute the F1 score, also known as balanced F-score or F-measure.
[7]metrics.precision_score(y_true, y_pred[, …])：Compute the precision.
[8]metrics.recall_score(y_true, y_pred[, …])：Compute the recall.
[9]metrics.roc_auc_score(y_true, y_score[, …])：Compute Area Under the Curve [AUC] from prediction scores.
[10]metrics.roc_curve(y_true, y_score[, …]) Compute Receiver operating characteristic [ROC].

9.classification_report
解析：

from sklearn import metrics
y_true = [0, 0, 1, 1, 0, 1]
y_pred = [0, 1, 1, 1, 1, 1]
classify_report = metrics.classification_report(y_true, y_pred)
print(classify_report)

结果输出，如下所示：

             precision    recall  f1-score   support0       1.00      0.33      0.50         31       0.60      1.00      0.75         3avg / total       0.80      0.67      0.62         6

10.余弦相似度[Cosine Similarity]
解析：
s i m ( X , Y ) = cos ⁡ θ = x ⋅ y ∥ x ∥ ⋅ ∥ y ∥ sim\left( {X,Y} \right) = \cos \theta = \frac{{x \cdot y}}{{\left\| x \right\|\cdot\left\| y \right\|}} sim(X,Y)=cosθ=∥x∥⋅∥y∥x⋅y
说明：余弦相似性值的范围是-1到1。

11.tf.get_variable(name, shape, initializer)
解析：通过所给的名字创建或是返回一个变量。

12.tf.variable_scope()
解析：tf.variable_scope()主要结合tf.get_variable()来使用，实现变量共享。如下所示：

import tensorflow as tf
with tf.variable_scope('v1'):a1 = tf.get_variable(name='a1', shape=[1], initializer=tf.constant_initializer(1))
with tf.variable_scope('v1', reuse=True):a2 = tf.get_variable('a1')
with tf.Session() as sess:sess.run(tf.global_variables_initializer())print(a1.name)print(sess.run(a1))print(a2.name)print(sess.run(a2))

结果输出，如下所示：

v1/a1:0
[ 1.]
v1/a1:0
[ 1.]

或者使用函数reuse_variables，如下所示：

import tensorflow as tf
with tf.variable_scope('v1') as scope:a1 = tf.get_variable(name='a1', shape=[1], initializer=tf.constant_initializer(1))scope.reuse_variables()a2 = tf.get_variable('a1')with tf.Session() as sess:sess.run(tf.initialize_all_variables())print(a1.name)print(sess.run(a1))print(a2.name)print(sess.run(a2))

结果输出，如下所示：

v1/a1:0
[ 1.]
v1/a1:0
[ 1.]

13.tf.get_variable_scope
解析：returns the current variable scope.

14.tf.get_variable_scope().reuse_variables()
解析：开启变量重用的开关。

15.在一个作用域scope内共享变量
解析：
[1]第一种方法

with tf.variable_scope("image_filters") as scope:result1 = my_image_filter(image1)scope.reuse_variables()  #or#tf.get_variable_scope().reuse_variables()result2 = my_image_filter(image2)

[2]第二种方法

with tf.variable_scope("image_filters1") as scope:result1 = my_image_filter(image1)
with tf.variable_scope(scope, reuse=True)result2 = my_image_filter(image2)

16.tf.variable_scope和tf.name_scope
解析：

import tensorflow as tf
with tf.variable_scope("foo"):with tf.name_scope("bar"):v = tf.get_variable("v", [1])x = 1.0 + v
assert v.name == "foo/v:0"
assert x.op.name == "foo/bar/add"
print(v.name)
print(x.op.name)

结果输出，如下所示：

foo/v:0
foo/bar/add

17.根据特征选择形式将特征选择方法分为3种
解析：
[1]Filter：过滤法，按照发散性或者相关性对各个特征进行评分，设定阈值或者待选择阈值的个数，选择特征。
[2]Wrapper：包装法，根据目标函数[通常是预测效果评分]，每次选择若干特征，或者排除若干特征。
[3]Embedded：嵌入法，先使用某些机器学习的算法和模型进行训练，得到各个特征的权值系数，根据系数从大到小选择特征。类似于Filter方法，但是是通过训练来确定特征的优劣。

18.特征选择目的
解析：
[1]减少特征数量、降维，使模型泛化能力更强，减少过拟合；
[2]增强对特征和特征值之间的理解。

19.Filter过滤法
解析：
[1]移除低方差的特征
[2]单变量特征选择

方检验
Pearson相关系数
互信息和最大信息系数
距离相关系数
基于模型的特征排序

20.Wrapper包装法
解析：递归特征消除

21.Embedded嵌入法
解析：
[1]使用SelectFromModel选择特征

基于L1的特征选择
随机稀疏模型
基于树的特征选择

[2]将特征选择过程融入pipeline

22.聚类质量评估指标
解析：当实际类别未知时，常使用轮廓系数和Calinski-Harabaz Index。

23.min_samples_split min_samples_leaf区别
解析：
[1]min_samples_split：指定拆分内部节点所需的最小样本数。默认是2。
[2]min_samples_leaf：指定需要在叶节点处的最小样本数。默认是1。
如果min_samples_split=5，并且内部节点上有7个样本，那么允许拆分。但是，假设分割得到两个叶子，一个有1个样本，另一个有6个样本。如果min_samples_leaf=2，那么不允许拆分，因为得到的叶子之一将少于叶子节点所需的最小样本数。

24.confusion_matrix
解析：
[1]假设0为negative

y_true = [1, 0, 1, 1, 0, 1]
y_pred = [0, 0, 1, 0, 1, 0]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()

[2]假设0为positive

y_true = [1, 0, 1, 1, 0, 1]
y_pred = [0, 0, 1, 0, 1, 0]
tp, fn, fp, tn = confusion_matrix(y_true, y_pred).ravel()

参考文献：
[1]sklearn.metrics：.html#sklearn-metrics-metrics
[2]Scikit-Learn与特征选择：.html
[3]K-Means算法/层次聚类/密度聚类/聚类评估：
[4]Python实现聚类质量评估：

更多推荐

Python与机器学习5

本文发布于:2024-03-14 05:39:44，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1735753.html

机器 Python

上一篇： Java使用LinkedList实现大数相加
下一篇： c语言程序结集,C语言看程序写结集锦12

发布评论取消回复

评论列表（有 0 条评论）

Python与机器学习5

Python与机器学习5

发布评论取消回复

最近发表

热门文章

标签列表