admin管理员组

文章数量:1638919

In brief,信息密度就是相似度的加和或均值! 

When using uncertainty sampling (or other similar strategies), we are unable to take the structure of the data into account. 

This can lead us to suboptimal queries. To alleviate this, one method is to use information density measures to help us guide the queries.

 

 

from sklearn.datasets import make_blobs
from modAL.density import information_density
import matplotlib.pyplot as plt
from pathlib import Path
import numpy as np
import pandas as pd

##获取数据
X, y = make_blobs(n_features=2, n_samples=1000, centers=3, random_state=0, cluster_std=0.7)
##获取数据
# data_path = Path(r"D:\OCdata")
# name = "FourBolbs"
# path_data = str(data_path.joinpath(name + ".csv"))
# data = np.array(pd.read_csv(path_data, header=None))
# X = data[:,:-1]
# y = data[:, -1]

euclidean_density = information_density(X,"euclidean")
cosine_density = information_density(X,"cosine")

plt.style.context('seaborn-while')
plt.figure(figsize=(14,7))
plt.subplot(1,2,1)
plt.scatter(x=X[:, 0], y=X[:, 1], c=cosine_density, cmap='viridis', s=50)
plt.title('The cosine information density')
plt.colorbar()
plt.subplot(1,2,2)
plt.scatter(x=X[:,0],y=X[:,1],c=euclidean_density,cmap="viridis",s=50)
plt.title("The euclidean information density")
plt.colorbar()
plt.show()

参考:https://modal-python.readthedocs.io/en/latest/content/query_strategies/information_density.html

计算各种距离和相似度的

from sklearn.metrics.pairwise import pairwise_distances

    similarity_mtx = 1/(1+pairwise_distances(X, X, metric=metric))

    return similarity_mtx.mean(axis=1)

 

 

本文标签: 密度代码信息informationdensity