我有两个RandomForestClassifier模型,我想将它们组合成一个元模型。他们都使用相似但不同的数据进行了训练。我该怎么做?
I have two RandomForestClassifier models, and I would like to combine them into one meta model. They were both trained using similar, but different, data. How can I do this?
rf1 #this is my first fitted RandomForestClassifier object, with 250 trees rf2 #this is my second fitted RandomForestClassifier object, also with 250 trees我要创建 big_rf 将所有树合并为一个500棵树模型
I want to create big_rf with all trees combined into one 500 tree model
推荐答案我相信可以通过修改RandomForestClassifier对象的 estimators _ 和 n_estimators 属性。森林中的每棵树都存储为DecisionTreeClassifier对象,这些树的列表存储在 estimators _ 属性中。为了确保不存在间断,更改 n_estimators 中的估计量的数量也很有意义。
I believe this is possible by modifying the estimators_ and n_estimators attributes on the RandomForestClassifier object. Each tree in the forest is stored as a DecisionTreeClassifier object, and the list of these trees is stored in the estimators_ attribute. To make sure there is no discontinuity, it also makes sense to change the number of estimators in n_estimators.
这种方法的优点是您可以在多台计算机上并行构建一堆小森林并将其组合。
The advantage of this method is that you could build a bunch of small forests in parallel across multiple machines and combine them.
以下是使用虹膜数据集的示例:
Here's an example using the iris data set:
from sklearn.ensemble import RandomForestClassifier from sklearn.cross_validation import train_test_split from sklearn.datasets import load_iris def generate_rf(X_train, y_train, X_test, y_test): rf = RandomForestClassifier(n_estimators=5, min_samples_leaf=3) rf.fit(X_train, y_train) print "rf score ", rf.score(X_test, y_test) return rf def combine_rfs(rf_a, rf_b): rf_a.estimators_ += rf_b.estimators_ rf_a.n_estimators = len(rf_a.estimators_) return rf_a iris = load_iris() X, y = iris.data[:, [0,1,2]], iris.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.33) # in the line below, we create 10 random forest classifier models rfs = [generate_rf(X_train, y_train, X_test, y_test) for i in xrange(10)] # in this step below, we combine the list of random forest models into one giant model rf_combined = reduce(combine_rfs, rfs) # the combined model scores better than *most* of the component models print "rf combined score", rf_combined.score(X_test, y_test)更多推荐
结合scikit学习中的随机森林模型
发布评论