Python中的标准化/标准化测试数据(Standardization/Normalization test data in Python)

编程入门行业动态更新时间:2024-10-27 00:30:49

我正在做一个sklearn作业，我不明白为什么要用训练均值和sd来标准化和标准化测试数据。我如何在Python中实现它？这里是我的火车数据实现：

digits = sklearn.datasets.load_digits() X= digits.data Y= digits.target X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3,train_size=0.7) std_scale = preprocessing.StandardScaler().fit(X_train) X_train_std = std_scale.transform(X_train) #X_test_std=??

对于火车我认为这是正确的，但对于测试？

I'm working in a sklearn homework and I don't understand why one should standardize and normalize the test data with the training mean and sd. How can I implement this in Python? Here is my implementation for the train data:

For the train i think it's correct, but for the test?

最满意答案

为什么？

因为您的分类器/回归器将接受这些标准化值的培训。你不想用你的训练分类器来预测其他统计数据。

怎么样：

std_scale = preprocessing.StandardScaler().fit(X_train) X_train_std = std_scale.transform(X_train) X_test_std = std_scale.transform(X_test)

适合一次，改变你需要改变的任何东西。这是基于类的StandardScaler （您已经选择的）的优势，与之后没有适用转换所需的信息（基于适合期间获得的统计数据）所需的信息的比例相比。

Why?

Because your classifier/regressor will be trained on those standardizes values. You don't want to use your trained-classifier to predict data which has other statistics.

How:

std_scale = preprocessing.StandardScaler().fit(X_train) X_train_std = std_scale.transform(X_train) X_test_std = std_scale.transform(X_test)

Fitting once, transforming whatever you need to transform. That's the advantage of the class-based StandardScaler (which you already had chosen) compared to scale which does not hold the needed information needed for applying transformations (based on these statistics obtained during fit) at a later time.

更多推荐

本文发布于:2023-08-06 06:00:00，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1445039.html