如何在不立即加载整个数据集的情况下将数据集拆分为 K 折?

编程入门 行业动态 更新时间:2024-10-09 05:21:38
本文介绍了如何在不立即加载整个数据集的情况下将数据集拆分为 K 折?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

限时送ChatGPT账号..

我无法一次加载所有数据集,因此我使用了 tf.keras.preprocessing.image_dataset_from_directory() 以便在训练期间加载批量图像.如果我想将我的数据集分成 2 个子集(训练和验证),它工作得很好,但是,我想将我的数据集分成 K 折以进行交叉验证.(5折就好了)

I can't load all of my dataset at once, so I used tf.keras.preprocessing.image_dataset_from_directory() in order to load batches of images during training. It works well if I want to split my dataset into 2 subsets (train and validation), however, I'd like to divide my dataset into K-folds in order to make cross validation. (5 folds would be nice)

如何在不加载整个数据集的情况下制作 K 折?我必须放弃使用 tf.keras.preprocessing.image_dataset_from_directory() 吗?

How can I make K-folds without loading my whole dataset ? Do I have to give up using tf.keras.preprocessing.image_dataset_from_directory() ?

推荐答案

我个人建议你改用 tf.data.Dataset().

它不仅效率更高,而且在实现内容方面为您提供了更大的灵活性.

Not only is it more efficient but it gives you more flexibility in terms of what you can implement.

假设您有图像(image_paths)和 labels 作为示例.

Say you have images(image_paths) and labels as an example.

通过这种方式,您可以创建如下管道:

In that way, you could create a pipeline like:

training_data = []
validation_data = []
kf = KFold(n_splits=5,shuffle=True,random_state=42)
for train_index, val_index in kf.split(images,labels):
    X_train, X_val = images[train_index], images[val_index]
    y_train, y_val = labels[train_index], labels[val_index]
    training_data.append([X_train,y_train])
    validation_data.append([X_val,y_val])

然后你可以创建类似的东西:

Then you could create something like:

for index, _ in enumerate(training_data):
    x_train, y_train = training_data[index][0], training_data[index][1]
    x_valid, y_valid = validation_data[index][0], validation_data[index][1]
   
    train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
    train_dataset = train_dataset.map(mapping_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    train_dataset = train_dataset.batch(batch_size)
    train_dataset = train_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
    
    
    validation_dataset = tf.data.Dataset.from_tensor_slices((x_valid, y_valid))
    validation_dataset = validation_dataset.map(mapping_function, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    validation_dataset = validation_dataset.batch(batch_size)
    validation_dataset = validation_dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)



    model.fit(train_dataset,
             validation_data=validation_dataset,
             epochs=epochs,
             verbose=2)

这篇关于如何在不立即加载整个数据集的情况下将数据集拆分为 K 折?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

更多推荐

[db:关键词]

本文发布于:2023-05-01 01:15:41,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1402478.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:数据   情况下   加载   如何在

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!