使用keras进行K折交叉验证

编程入门行业动态更新时间:2024-10-28 18:30:04

本文介绍了使用keras进行K折交叉验证的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！问题描述

由于神经网络的巨大运行时间，卷积网络中的k倍交叉验证似乎并未受到重视.我的数据集很小，我有兴趣使用此处.是否有可能?谢谢.

It seems that k-fold cross validation in convn net is not taken seriously due to huge running time of the neural network. I have a small data-set and I am interested in doing k-fold cross validation using the example given here. Is it possible? Thanks.

推荐答案

如果将图像与数据生成器一起使用，这是使用Keras和scikit-learn进行10倍交叉验证的一种方法.策略是根据每次折叠将文件复制到training，validation和test子文件夹.

If you are using images with data generators, here's one way to do 10-fold cross-validation with Keras and scikit-learn. The strategy is to copy the files to training, validation, and test subfolders according to each fold.

import numpy as np import os import pandas as pd import shutil from sklearn.model_selection import StratifiedKFold from sklearn.metrics import accuracy_score, classification_report, confusion_matrix # used to copy files according to each fold def copy_images(df, directory): destination_directory = "{path to your data directory}/" + directory print("copying {} files to {}...".format(directory, destination_directory)) # remove all files from previous fold if os.path.exists(destination_directory): shutil.rmtree(destination_directory) # create folder for files from this fold if not os.path.exists(destination_directory): os.makedirs(destination_directory) # create subfolders for each class for c in set(list(df['class'])): if not os.path.exists(destination_directory + '/' + c): os.makedirs(destination_directory + '/' + c) # copy files for this fold from a directory holding all the files for i, row in df.iterrows(): try: # this is the path to all of your images kept together in a separate folder path_from = "{path to all of your images}" path_from = path_from + "{}.jpg" path_to = "{}/{}".format(destination_directory, row['class']) # move from folder keeping all files to training, test, or validation folder (the "directory" argument) shutil.copy(path_from.format(row['filename']), path_to) except Exception, e: print("Error when copying {}: {}".format(row['filename'], str(e))) # dataframe containing the filenames of the images (e.g., GUID filenames) and the classes df = pd.read_csv('{path to your data}.csv') df_y = df['class'] df_x = df del df_x['class'] skf = StratifiedKFold(n_splits = 10) total_actual = [] total_predicted = [] total_val_accuracy = [] total_val_loss = [] total_test_accuracy = [] for i, (train_index, test_index) in enumerate(skf.split(df_x, df_y)): x_train, x_test = df_x.iloc[train_index], df_x.iloc[test_index] y_train, y_test = df_y.iloc[train_index], df_y.iloc[test_index] train = pd.concat([x_train, y_train], axis=1) test = pd.concat([x_test, y_test], axis = 1) # take 20% of the training data from this fold for validation during training validation = train.sample(frac = 0.2) # make sure validation data does not include training data train = train[~train['filename'].isin(list(validation['filename']))] # copy the images according to the fold copy_images(train, 'training') copy_images(validation, 'validation') copy_images(test, 'test') print('**** Running fold '+ str(i)) # here you call a function to create and train your model, returning validation accuracy and validation loss val_accuracy, val_loss = create_train_model(); # append validation accuracy and loss for average calculation later on total_val_accuracy.append(val_accuracy) total_val_loss.append(val_loss) # here you will call a predict() method that will predict the images on the "test" subfolder # this function returns the actual classes and the predicted classes in the same order actual, predicted = predict() # append accuracy from the predictions on the test data total_test_accuracy.append(accuracy_score(actual, predicted)) # append all of the actual and predicted classes for your final evaluation total_actual = total_actual + actual total_predicted = total_predicted + predicted # this is optional, but you can also see the performance on each fold as the process goes on print(classification_report(total_actual, total_predicted)) print(confusion_matrix(total_actual, total_predicted)) print(classification_report(total_actual, total_predicted)) print(confusion_matrix(total_actual, total_predicted)) print("Validation accuracy on each fold:") print(total_val_accuracy) print("Mean validation accuracy: {}%".format(np.mean(total_val_accuracy) * 100)) print("Validation loss on each fold:") print(total_val_loss) print("Mean validation loss: {}".format(np.mean(total_val_loss))) print("Test accuracy on each fold:") print(total_test_accuracy) print("Mean test accuracy: {}%".format(np.mean(total_test_accuracy) * 100))

在predict()函数中，如果您使用的是数据生成器，那么在测试时我发现保持预测顺序不变的唯一方法是使用batch_size的1:

In your predict() function, if you are using a data generator, the only way I could find to keep the predictions in the same order when testing was to use a batch_size of 1:

generator = ImageDataGenerator().flow_from_directory( '{path to your data directory}/test', target_size = (img_width, img_height), batch_size = 1, color_mode = 'rgb', # categorical for a multiclass problem class_mode = 'categorical', # this will also ensure the same order shuffle = False)

使用此代码，我能够使用数据生成器进行10倍交叉验证(因此，我不必将所有文件都保留在内存中).如果您有数百万个图像，这可能会花费很多工作，而如果测试集很大，则batch_size = 1可能会成为瓶颈，但是对于我的项目而言，效果很好.

With this code, I was able to do 10-fold cross-validation using data generators (so I did not have to keep all files in memory). This can be a lot of work if you have millions of images and the batch_size = 1 could be a bottleneck if your test set is large, but for my project this worked well.

更多推荐

使用keras进行K折交叉验证

本文发布于:2023-07-08 02:45:44，感谢您对本站的认可！

本文链接:https://www.elefans.com/category/jswz/34/1071089.html