在pickle dump / load之后,Sklearn gridsearchCV对象发生了变化(Sklearn gridsearchCV object changed after pickle d

编程入门 行业动态 更新时间:2024-10-28 02:22:37
在pickle dump / load之后,Sklearn gridsearchCV对象发生了变化(Sklearn gridsearchCV object changed after pickle dump/load)

我有一个我创建的gridsearchCV对象

grid_search = GridSearchCV(pred_home_pipeline, param_grid)

我想保存整个网格搜索对象,以便稍后探索模型调优结果。 我不想只保存the best_estimator_ 。 但是在转储和重新加载之后,重新加载的和原始的grid_search对象在某种程度上是不同的,我无法追踪。

# save to disk with open(filepath, 'wb') as handle: pickle.dump(grid_search, handle, protocol=pickle.HIGHEST_PROTOCOL) # reload with open(filepath, 'rb') as handle: grid_reloaded = pickle.load(handle) # test object is unchanged after dump/reload print(grid_search == grid_reloaded)

奇怪的。 看看print(grid_search)和print(grid_reloaded)的输出,它们看起来一样。

并且它们为我完全从网格搜索过程中提供的数据创建了完全相同的525个预测值:

grid_search_preds = grid_search.predict(X_test) grid_reloaded_preds= grid_reloaded.predict(X_test) (grid_search_preds == grid_reloaded_preds).all()

真正

...即使best_estimator_属性在技术上不相同:

grid_search.best_estimator_ == grid_reloaded.best_estimator_

...虽然best_estimate_属性看起来也一样,比较print(grid_search.best_estimatmator_)和print(grid_reloaded.best_estimator_)

这里发生了什么? 以后保存gridsearchcv对象以供检查是否安全?

I have a gridsearchCV object I created with

grid_search = GridSearchCV(pred_home_pipeline, param_grid)

I would like to save the entire grid-search object so I can explore the model-tuning results later. I do not want to just save the best_estimator_. But after dumping and reloading, the reloaded and original grid_search objects are different in some way which I cannot track down.

# save to disk with open(filepath, 'wb') as handle: pickle.dump(grid_search, handle, protocol=pickle.HIGHEST_PROTOCOL) # reload with open(filepath, 'rb') as handle: grid_reloaded = pickle.load(handle) # test object is unchanged after dump/reload print(grid_search == grid_reloaded)

False

Weird. Looking at the outputs of print(grid_search) and print(grid_reloaded) they certainly look the same.

And they create the exact same set of 525 predicted values for data I held out entirely from the grid-search process:

grid_search_preds = grid_search.predict(X_test) grid_reloaded_preds= grid_reloaded.predict(X_test) (grid_search_preds == grid_reloaded_preds).all()

True

...Even though the best_estimator_ attributes are not technically the same:

grid_search.best_estimator_ == grid_reloaded.best_estimator_

False

...although the best_estimate_ attributes also certainly look the same comparing print(grid_search.best_estimatmator_) and print(grid_reloaded.best_estimator_)

What's going on here? Is it safe to save the gridsearchcv object for inspection later?

最满意答案

那是因为比较是否返回对象是否是同一个对象。

要查看原因,请遵循对象层次结构,您将看到没有覆盖__eq__函数(或__cmp__ ):

GridSearchCV BaseSearchCV BaseEstimator

因此,“==”比较回退到对象内存位置比较,当然您的重新加载的实例和当前实例不能相等。 这是比较,看它们是否是同一个对象。

在这里查看更多。

That's because the comparison is returning whether or not the objects are the same object.

To see why, follow the object hierarchy, you'll see there's no __eq__ function overridden (or __cmp__):

GridSearchCV BaseSearchCV BaseEstimator

Thus the "==" comparison falls back to a object memory location comparison for which of course your reloaded instance and your current instance cannot be equal. This is comparing to see if they are the same object.

See more here.

更多推荐

本文发布于:2023-07-24 13:22:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1246284.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:发生了   对象   load   dump   pickle

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!