Records when constructing the scheduler

  1. How does Resnet code in tensorflow/models be distributed?

High API Estimator

distributed_strategy in utils/misc

2. How to make codes in distributed?

Refer Tensorflow tutorial:

Set a distributed strategy and scope including model construction and model compile


VGG uses data augmentation which is in conflict with distribution!

In Keras tutorial, if we use fit_generator method, then we will meet this error:

fit_generator` is not supported for models compiled with tf.distribute.strategy.

Our Tensorflow version is 1.14

ImageDataGenerator tutorial code

If we use ‘manual’ example in the official tutorial, then the training will become wield:

Use single GPU this is the std output:

Above is normal(though different from using fit_generator). Below is the distributed version using mirror strategy, it’s abnormal:

Distributed version stuck in the first epoch and the loss is high for a long time.


This issue suggests using to deal with the generator.

Categories: 未分类