问题描述
限时送ChatGPT账号..我正在尝试使用 Google colab TPU 对 cifar10 图像进行分类,根据官方教程.
I'm trying to classify cifar10 images with Google colab TPU, according to the official tutorial.
但是我收到了以下错误.
However I got the following error.
UnimplementedError:发现 6 个根错误.
UnimplementedError: 6 root error(s) found.
没有使用 TPU,我没有看到任何错误.有人可以分享一些建议吗?
Without using TPU, I didn't see any error. Could someone share some advice?
下面附上我的代码.
from tensorflow.keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.applications.vgg16 import VGG16
import tensorflow as tf
import numpy as np
import os
import tensorflow_datasets as tfds
# preparing TPU
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='')
tf.config.experimental_connect_to_cluster(resolver)
# This is the TPU initialization code that has to be at the beginning.
tf.tpu.experimental.initialize_tpu_system(resolver)
print("All devices: ", tf.config.list_logical_devices('TPU'))
strategy = tf.distribute.TPUStrategy(resolver)
# download cifar10 data
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'], )
# Preprocess the images
def resize_with_crop(ip):
image = ip['image']
label = ip['label']
image = tf.expand_dims(image,0)
label = tf.one_hot(label,10)
label = tf.expand_dims(label,0)
return (image, label)
ds_train_ = ds_train.map(resize_with_crop)
ds_test_ = ds_test.map(resize_with_crop)
with strategy.scope():
model = VGG16(input_shape = (32, 32, 3), weights=None, classes=10)
modelpile(optimizer='adam', loss = 'categorical_crossentropy', metrics= ['accuracy'])
history = model.fit(ds_train_,
batch_size = 32,
steps_per_epoch = 64,
epochs = 1000,
validation_data = ds_test_,
shuffle = True,)
我得到的错误如下.
---------------------------------------------------------------------------
UnimplementedError Traceback (most recent call last)
<ipython-input-2-588bff080f0b> in <module>()
25 epochs = 1000,
26 validation_data = ds_test_,
---> 27 shuffle = True,)
28
29 '''
13 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1187 logs = tmp_logs # No error, now safe to assign to logs.
1188 end_step = step + data_handler.step_increment
-> 1189 callbacks.on_train_batch_end(end_step, logs)
1190 if self.stop_training:
1191 break
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in on_train_batch_end(self, batch, logs)
433 """
434 if self._should_call_train_batch_hooks:
--> 435 self._call_batch_hook(ModeKeys.TRAIN, 'end', batch, logs=logs)
436
437 def on_test_batch_begin(self, batch, logs=None):
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in _call_batch_hook(self, mode, hook, batch, logs)
293 self._call_batch_begin_hook(mode, batch, logs)
294 elif hook == 'end':
--> 295 self._call_batch_end_hook(mode, batch, logs)
296 else:
297 raise ValueError('Unrecognized hook: {}'.format(hook))
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in _call_batch_end_hook(self, mode, batch, logs)
313 self._batch_times.append(batch_time)
314
--> 315 self._call_batch_hook_helper(hook_name, batch, logs)
316
317 if len(self._batch_times) >= self._num_batches_for_timing_check:
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in _call_batch_hook_helper(self, hook_name, batch, logs)
351 for callback in self.callbacks:
352 hook = getattr(callback, hook_name)
--> 353 hook(batch, logs)
354
355 if self._check_timing:
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in on_train_batch_end(self, batch, logs)
1026
1027 def on_train_batch_end(self, batch, logs=None):
-> 1028 self._batch_update_progbar(batch, logs)
1029
1030 def on_test_batch_end(self, batch, logs=None):
/usr/local/lib/python3.7/dist-packages/keras/callbacks.py in _batch_update_progbar(self, batch, logs)
1098 if self.verbose == 1:
1099 # Only block async when verbose = 1.
-> 1100 logs = tf_utils.sync_to_numpy_or_python_type(logs)
1101 self.progbar.update(self.seen, list(logs.items()), finalize=False)
1102
/usr/local/lib/python3.7/dist-packages/keras/utils/tf_utils.py in sync_to_numpy_or_python_type(tensors)
514 return t # Don't turn ragged or sparse tensors to NumPy.
515
--> 516 return tf.nest.map_structure(_to_single_numpy_or_python_type, tensors)
517
518
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in map_structure(func, *structure, **kwargs)
867
868 return pack_sequence_as(
--> 869 structure[0], [func(*x) for x in entries],
870 expand_composites=expand_composites)
871
/usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py in <listcomp>(.0)
867
868 return pack_sequence_as(
--> 869 structure[0], [func(*x) for x in entries],
870 expand_composites=expand_composites)
871
/usr/local/lib/python3.7/dist-packages/keras/utils/tf_utils.py in _to_single_numpy_or_python_type(t)
510 def _to_single_numpy_or_python_type(t):
511 if isinstance(t, tf.Tensor):
--> 512 x = t.numpy()
513 return x.item() if np.ndim(x) == 0 else x
514 return t # Don't turn ragged or sparse tensors to NumPy.
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in numpy(self)
1092 """
1093 # TODO(slebedev): Consider avoiding a copy for non-CPU or remote tensors.
-> 1094 maybe_arr = self._numpy() # pylint: disable=protected-access
1095 return maybe_arr.copy() if isinstance(maybe_arr, np.ndarray) else maybe_arr
1096
/usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self)
1060 return self._numpy_internal()
1061 except core._NotOkStatusException as e: # pylint: disable=protected-access
-> 1062 six.raise_from(core._status_to_exception(e.code, e.message), None) # pylint: disable=protected-access
1063
1064 @property
/usr/local/lib/python3.7/dist-packages/six.py in raise_from(value, from_value)
UnimplementedError: 6 root error(s) found.
(0) Unimplemented: {{function_node __inference_train_function_127397}} File system scheme '[local]' not implemented (file: '/root/tensorflow_datasets/cifar10/3.0.2/cifar10-train.tfrecord-00000-of-00001')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNext_2]]
(1) Unimplemented: {{function_node __inference_train_function_127397}} File system scheme '[local]' not implemented (file: '/root/tensorflow_datasets/cifar10/3.0.2/cifar10-train.tfrecord-00000-of-00001')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNext_6]]
(2) Unimplemented: {{function_node __inference_train_function_127397}} File system scheme '[local]' not implemented (file: '/root/tensorflow_datasets/cifar10/3.0.2/cifar10-train.tfrecord-00000-of-00001')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNext_3]]
[[cluster_train_function/_execute_6_0/_187]]
(3) Unimplemented: {{function_node __inference_train_function_127397}} File system scheme '[local]' not implemented (file: '/root/tensorflow_datasets/cifar10/3.0.2/cifar10-train.tfrecord-00000-of-00001')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNext_3]]
[[tpu_compile_succeeded_assert/_17093395999373799140/_5/_159]]
(4) Unimplemented: {{function_node __inference_train_function_127397}} File system scheme '[local]' not implemented (file: '/root/tensorflow_datasets/cifar10/3.0.2/cifar10-train.tfrecord-00000-of-00001')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNext_3]]
[[tpu_compile_succeeded_assert/_17093395999373799140/_5/_111]]
(5) Unimplemented: {{function_node __inference_train_function_127397}} File system scheme '[local]' not implemented (file: '/root/tensorflow_datasets/cifar10/3.0.2/cifar10-train.tfrecord-00000-of-00001')
[[{{node MultiDeviceIteratorGetNextFromShard}}]]
[[RemoteCall]]
[[IteratorGetNext_3]]
0 successful operations.
3 derived errors ignored.
推荐答案
如果您查看错误,它会显示File system scheme '[local]' not implements
.
if you look to the error, it says File system scheme '[local]' not implemented
.
tfds 通常不会托管所有数据集,而是从原始源下载一些数据集到您的本地计算机,而 TPU 无法访问这些数据集.
tfds often doesn't host all the datasets and downloads some from the original source to your local machine, which TPU can't access.
由于仅注册了 GCS 文件系统,因此 Cloud TPU 只能访问 GCS 中的数据.请参阅:https://cloud.google/tpu/docs/troubleshooting#cannot_use_local_filesystem 了解更多详情.
Cloud TPUs can only access data in GCS as only the GCS file system is registered. Please see: https://cloud.google/tpu/docs/troubleshooting#cannot_use_local_filesystem for more details.
您可以制作 tfds 将数据下载到您的 gs 存储桶中(详情在此处):
You can make tfds to download the data to your gs bucket (details are here):
# Authenticate your account to access GCS.
from google.colab import auth
auth.authenticate_user()
...
# download cifar10 data to a gs bucket.
ds_test, ds_train = tfds.load('cifar10', split=['test', 'train'], try_gcs=True, data_dir="gs://YOUR_BUCKET_NAME")
请注意,最近推出的 TPU VM 可以访问本地文件.您可以在 GCP 中创建 TPU 虚拟机,但还不能在 Colab/Kaggle 中创建.
Note that recently introduced TPU VMs can access local files. And you can create TPU VMs in GCP but not yet in Colab/Kaggle.
这篇关于调用 model.fit() 时出现 Colab TPU 错误:UnimplementedError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
更多推荐
[db:关键词]
发布评论