我目前正在 tensorflow 中做一个基本的图像分类算法,代码基本上完全遵循https://www.tensorflow.org/tutorials/images/classification给出的教程,除了我使用自己的数据。
目前我有以下设置用于生成数据集:
#Set up information on the data
batch_size = 32
img_height = 100
img_width = 100
#Generate training dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
Directory,
validation_split=0.8,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
#Generate val dataset
val_ds = tf.keras.utils.image_dataset_from_directory(
Directory,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
但在终端输出中,在我们的集群上运行后,我看到以下内容:
2022-09-30 09:49:26.936639: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856]
The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
2022-09-30 09:49:26.956813: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Found 2080581 files belonging to 2 cl.
Using 416117 files for training.
Found 2080581 files belonging to 2 cl.
Using 416116 files for validation.
我没有大量的 tensorflow 经验,不能真正弄清楚如何解决这个错误,任何人都可以指出我在正确的方向?

您将保留 20 % 的数据用于训练(2080581 * 20% ≈ 416117
),因为validation_split
是 80 %。
#Generate training dataset
train_ds = tf.keras.utils.image_dataset_from_directory(
Directory,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
#Generate val dataset
val_ds = tf.keras.utils.image_dataset_from_directory(
Directory,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size)
本站系公益性非盈利分享网址,本文来自用户投稿,不代表码文网立场,如若转载,请注明出处
评论列表(59条)