Records when constructing the scheduler

  1. How does Resnet code in tensorflow/models be distributed?

High API Estimator

distributed_strategy in utils/misc

2. How to make codes in https://github.com/geifmany/cifar-vgg distributed?

Refer Tensorflow tutorial:

Set a distributed strategy and scope including model construction and model compile

However

VGG uses data augmentation which is in conflict with distribution!

In Keras tutorial, if we use fit_generator method, then we will meet this error:

fit_generator` is not supported for models compiled with tf.distribute.strategy.

Our Tensorflow version is 1.14


ImageDataGenerator tutorial code

If we use ‘manual’ example in the official tutorial, then the training will become wield:

Use single GPU this is the std output:

Above is normal(though different from using fit_generator). Below is the distributed version using mirror strategy, it’s abnormal:

Distributed version stuck in the first epoch and the loss is high for a long time.

Solution:

This issue suggests using tf.data.Dataset.from_generator to deal with the generator.

Categories: 未分类