Records when constructing the scheduler
- How does Resnet code in tensorflow/models be distributed?
High API Estimator
distributed_strategy in utils/misc
2. How to make codes in https://github.com/geifmany/cifar-vgg distributed?
Refer Tensorflow tutorial:
Set a distributed strategy and scope including model construction and model compile
However
VGG uses data augmentation which is in conflict with distribution!
In Keras tutorial, if we use fit_generator method, then we will meet this error:
fit_generator` is not supported for models compiled with tf.distribute.strategy.
Our Tensorflow version is 1.14

ImageDataGenerator tutorial code
If we use ‘manual’ example in the official tutorial, then the training will become wield:
Use single GPU this is the std output:

Above is normal(though different from using fit_generator). Below is the distributed version using mirror strategy, it’s abnormal:

Distributed version stuck in the first epoch and the loss is high for a long time.
Solution:
This issue suggests using tf.data.Dataset.from_generator to deal with the generator.