Records when constructing the scheduler

    1. How does Resnet code in tensorflow/models be distributed?

    High API Estimator

    distributed_strategy in utils/misc

    2. How to make codes in https://github.com/geifmany/cifar-vgg distributed?

    Refer Tensorflow tutorial:

    Set a distributed strategy and scope including model construction and model compile

    However

    VGG uses data augmentation which is in conflict with distribution!

    In Keras tutorial, if we use fit_generator method, then we will meet this error:

    fit_generator` is not supported for models compiled with tf.distribute.strategy.

    Our Tensorflow version is 1.14


    ImageDataGenerator tutorial code

    If we use ‘manual’ example in the official tutorial, then the training will become wield:

    Use single GPU this is the std output:

    Above is normal(though different from using fit_generator). Below is the distributed version using mirror strategy, it’s abnormal:

    Distributed version stuck in the first epoch and the loss is high for a long time.

    Solution:

    This issue suggests using tf.data.Dataset.from_generator to deal with the generator.

    Categories: 未分类