sklearn.cluster.AgglomerativeClustering的文档提到了这一点,
The documentation for sklearn.cluster.AgglomerativeClustering mentions that,
当更改集群数量并使用缓存时, 计算整个树可能是有利的.
when varying the number of clusters and using caching, it may be advantageous to compute the full tree.
这似乎意味着可以先计算完整的树,然后根据需要快速更新所需集群的数量,而无需重新计算树(使用缓存).
This seems to imply that it is possible to first compute the full tree, and then quickly update the number of desired clusters as necessary, without recomputing the tree (with caching).
但是,似乎没有记录此更改群集数的过程.我想这样做,但是不确定如何进行.
However this procedure for changing the number of clusters does not seem to be documented. I would like to do this but am unsure how to proceed.
更新:为明确起见,fit方法未将簇数作为输入: scikit-learn /stable/modules/generation/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit
Update: To clarify, the fit method does not take number of clusters as an input: scikit-learn/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html#sklearn.cluster.AgglomerativeClustering.fit
推荐答案使用参数memory = 'mycachedir'设置缓存目录,然后如果使用compute_full_tree=True设置,则使用不同的n_clusters值重新运行fit时,它将使用缓存的树,而不是每次都重新计算.为您提供有关如何使用sklearn的gridsearch API进行此操作的示例:
You set a cacheing directory with the paramater memory = 'mycachedir' and then if you set compute_full_tree=True, when you rerun fit with different values of n_clusters, it will used the cached tree rather than recomputing each time. To give you an example of how to do this with sklearn's gridsearch API:
from sklearn.cluster import AgglomerativeClustering from sklearn.grid_search import GridSearchCV ac = AgglomerativeClustering(memory='mycachedir', compute_full_tree=True) classifier = GridSearchCV(ac, {n_clusters: range(2,6)}, scoring = 'adjusted_rand_score', n_jobs=-1, verbose=2) classifier.fit(X,y)更多推荐
sklearn聚集集群:动态更新集群数量
发布评论