我有一个我创建的gridsearchCV对象
grid_search = GridSearchCV(pred_home_pipeline, param_grid)我想保存整个网格搜索对象,以便稍后探索模型调优结果。 我不想只保存the best_estimator_ 。 但是在转储和重新加载之后,重新加载的和原始的grid_search对象在某种程度上是不同的,我无法追踪。
# save to disk with open(filepath, 'wb') as handle: pickle.dump(grid_search, handle, protocol=pickle.HIGHEST_PROTOCOL) # reload with open(filepath, 'rb') as handle: grid_reloaded = pickle.load(handle) # test object is unchanged after dump/reload print(grid_search == grid_reloaded)假
奇怪的。 看看print(grid_search)和print(grid_reloaded)的输出,它们看起来一样。
并且它们为我完全从网格搜索过程中提供的数据创建了完全相同的525个预测值:
grid_search_preds = grid_search.predict(X_test) grid_reloaded_preds= grid_reloaded.predict(X_test) (grid_search_preds == grid_reloaded_preds).all()真正
...即使best_estimator_属性在技术上不相同:
grid_search.best_estimator_ == grid_reloaded.best_estimator_假
...虽然best_estimate_属性看起来也一样,比较print(grid_search.best_estimatmator_)和print(grid_reloaded.best_estimator_)
这里发生了什么? 以后保存gridsearchcv对象以供检查是否安全?
I have a gridsearchCV object I created with
grid_search = GridSearchCV(pred_home_pipeline, param_grid)I would like to save the entire grid-search object so I can explore the model-tuning results later. I do not want to just save the best_estimator_. But after dumping and reloading, the reloaded and original grid_search objects are different in some way which I cannot track down.
# save to disk with open(filepath, 'wb') as handle: pickle.dump(grid_search, handle, protocol=pickle.HIGHEST_PROTOCOL) # reload with open(filepath, 'rb') as handle: grid_reloaded = pickle.load(handle) # test object is unchanged after dump/reload print(grid_search == grid_reloaded)False
Weird. Looking at the outputs of print(grid_search) and print(grid_reloaded) they certainly look the same.
And they create the exact same set of 525 predicted values for data I held out entirely from the grid-search process:
grid_search_preds = grid_search.predict(X_test) grid_reloaded_preds= grid_reloaded.predict(X_test) (grid_search_preds == grid_reloaded_preds).all()True
...Even though the best_estimator_ attributes are not technically the same:
grid_search.best_estimator_ == grid_reloaded.best_estimator_False
...although the best_estimate_ attributes also certainly look the same comparing print(grid_search.best_estimatmator_) and print(grid_reloaded.best_estimator_)
What's going on here? Is it safe to save the gridsearchcv object for inspection later?
最满意答案
那是因为比较是否返回对象是否是同一个对象。
要查看原因,请遵循对象层次结构,您将看到没有覆盖__eq__函数(或__cmp__ ):
GridSearchCV BaseSearchCV BaseEstimator因此,“==”比较回退到对象内存位置比较,当然您的重新加载的实例和当前实例不能相等。 这是比较,看它们是否是同一个对象。
在这里查看更多。
That's because the comparison is returning whether or not the objects are the same object.
To see why, follow the object hierarchy, you'll see there's no __eq__ function overridden (or __cmp__):
GridSearchCV BaseSearchCV BaseEstimatorThus the "==" comparison falls back to a object memory location comparison for which of course your reloaded instance and your current instance cannot be equal. This is comparing to see if they are the same object.
See more here.
更多推荐
发布评论