我们为什么要对数据进行标准化以在Keras中进行深度学习?

编程入门行业动态更新时间:2024-10-16 00:26:26

本文介绍了我们为什么要对数据进行标准化以在Keras中进行深度学习?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

我正在Keras中测试某些网络体系结构以对MNIST数据集进行分类.我已经实现了一个类似于LeNet的软件.

I was testing some network architectures in Keras for classifying the MNIST dataset. I have implemented one that is similar to the LeNet.

我已经看到，在互联网上找到的示例中，有一个数据标准化步骤.例如:

I have seen that in the examples that I have found on the internet, there is a step of data normalization. For example:

X_train /= 255

我没有进行这种标准化就进行了测试，并且我发现网络的性能(准确性)下降了(保持相同的时期数).为什么会这样呢?

I have performed a test without this normalization and I have seen that the performance (accuracy) of the network has decreased (keeping the same number of epochs). Why has this happened?

如果我增加历元数，那么精度可以达到用归一化训练的模型所达到的水平吗?

If I increase the number of epochs, the accuracy can reach the same level reached by the model trained with normalization?

那么，归一化会影响准确性，还是仅影响训练速度?

So, the normalization affects the accuracy, or only the training speed?

我的训练脚本的完整源代码如下:

The complete source code of my training script is below:

from keras.models import Sequential from keras.layers.convolutional import Conv2D from keras.layers.convolutional import MaxPooling2D from keras.layers.core import Activation from keras.layers.core import Flatten from keras.layers.core import Dense from keras.datasets import mnist from keras.utils import np_utils from keras.optimizers import SGD, RMSprop, Adam import numpy as np import matplotlib.pyplot as plt from keras import backend as k def build(input_shape, classes): model = Sequential() model.add(Conv2D(20, kernel_size=5, padding="same",activation='relu',input_shape=input_shape)) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Conv2D(50, kernel_size=5, padding="same", activation='relu')) model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2))) model.add(Flatten()) model.add(Dense(500)) model.add(Activation("relu")) model.add(Dense(classes)) model.add(Activation("softmax")) return model NB_EPOCH = 4 # number of epochs BATCH_SIZE = 128 # size of the batch VERBOSE = 1 # set the training phase as verbose OPTIMIZER = Adam() # optimizer VALIDATION_SPLIT=0.2 # percentage of the training data used for evaluating the loss function IMG_ROWS, IMG_COLS = 28, 28 # input image dimensions NB_CLASSES = 10 # number of outputs = number of digits INPUT_SHAPE = (1, IMG_ROWS, IMG_COLS) # shape of the input (X_train, y_train), (X_test, y_test) = mnist.load_data() k.set_image_dim_ordering("th") X_train = X_train.astype('float32') X_test = X_test.astype('float32') X_train /= 255 X_test /= 255 X_train = X_train[:, np.newaxis, :, :] X_test = X_test[:, np.newaxis, :, :] print(X_train.shape[0], 'train samples') print(X_test.shape[0], 'test samples') y_train = np_utils.to_categorical(y_train, NB_CLASSES) y_test = np_utils.to_categorical(y_test, NB_CLASSES) model = build(input_shape=INPUT_SHAPE, classes=NB_CLASSES) modelpile(loss="categorical_crossentropy", optimizer=OPTIMIZER,metrics=["accuracy"]) history = model.fit(X_train, y_train, batch_size=BATCH_SIZE, epochs=NB_EPOCH, verbose=VERBOSE, validation_split=VALIDATION_SPLIT) model.save("model2") score = model.evaluate(X_test, y_test, verbose=VERBOSE) print('Test accuracy:', score[1])

推荐答案

规范化是一个通用概念，不仅限于深度学习或Keras.

Normalization is a generic concept not limited only to deep learning or to Keras.

为什么要规范化?

让我以一个简单的逻辑回归示例为例，该示例将易于理解和解释规范化. 假设我们正在尝试预测是否应该向客户提供贷款.在许多可用的自变量中，仅考虑Age和Income. 让方程的形式为:

Let me take a simple logistic regression example which will be easy to understand and to explain normalization. Assume we are trying to predict if a customer should be given loan or not. Among many available independent variables lets just consider Age and Income. Let the equation be of the form:

Y = weight_1 * (Age) + weight_2 * (Income) + some_constant

仅出于解释的目的，让Age通常在[0,120]的范围内，让我们假设Income在[10000，100000]的范围内. Age和Income的比例非常不同.如果按原样考虑它们，则可以为权重weight_1和weight_2分配偏重.与功能weight_1对Age的重要性相比，weight_2对作为功能的Income可能更为重要.为了将它们缩放到一个共同的水平，我们可以将它们标准化.例如，我们可以将所有年龄都设置在[0,1]范围内，并将所有收入都设置在[0,1]范围内.现在我们可以说Age和Income作为一项功能具有同等的重要性.

Just for sake of explanation let Age be usually in range of [0,120] and let us assume Income in range of [10000, 100000]. The scale of Age and Income are very different. If you consider them as is then weights weight_1 and weight_2 may be assigned biased weights. weight_2 might bring more importance to Income as a feature than to what weight_1 brings importance to Age. To scale them to a common level, we can normalize them. For example, we can bring all the ages in range of [0,1] and all incomes in range of [0,1]. Now we can say that Age and Income are given equal importance as a feature.

归一化是否总是提高准确性?

显然，不是.归一化总是可以提高准确性.可能有也可能没有，直到实施，您才真正知道.同样，这取决于您在训练的哪个阶段应用归一化，是否在每次激活后都应用归一化等.

Apparently, No. It is not necessary that normalization always increases accuracy. It may or might not, you never really know until you implement. Again it depends on at which stage in you training you apply normalization, on whether you apply normalization after every activation, etc.

由于归一化将特征值的范围缩小到特定范围，因此易于在较小的值范围内执行计算.因此，通常模型的训练速度更快.

As the range of the values of the features gets narrowed down to a particular range because of normalization, its easy to perform computations over a smaller range of values. So, usually the model gets trained a bit faster.

关于时期的数量，只要模型不开始过度拟合，精度通常会随着时期的数量而增加.

Regarding the number of epochs, accuracy usually increases with number of epochs provided that your model doesn't start over-fitting.