Learning Transferable Architectures for Scalable Image Recognition (CVPR 2018)

  • 논문제목: Learning Transferable Architectures for Scalable Image Recognition
  • 주저자: Barret Zoph (Google Brain)



  • Abstract
    • a method to learn the model architectures directly on the dataset of interest.
    • we propose to search for an architectural building block on a small dataset and then transfer the block to a larger dataset.
    • “NASNet search space” which enables transferability.
    • In our experiments, we search for the best convolutional layer (or “cell”) on the CIFAR-10 dataset and then apply this cell to the ImageNet dataset by stacking together more copies of this cell, each with their own parameters to design a convolutional  architecture, which we name a “NASNet architecture”.
    • We also introduce a new regularization technique called ScheduledDropPath that significantly improves generalization in the NASNet models.
    • Although the cell is not searched for directly on ImageNet, a NASNet constructed from the best cell achieves, among the published works, state-of-the-art accuracy of 82.7% top-1 and 96.2% top-5 on ImageNet. Our model is 1.2% better in top-1 accuracy than the best human-invented architectures while having 9 billion fewer FLOPS – a reduction of 28% in computational demand from the previous state-of-the-art model.
    • When evaluated at different levels of computational cost, accuracies of NASNets exceed those of the state-of-the-art human-designed models. For instance, a small version of NASNet also achieves 74% top-1 accuracy, which is 3.1% better than equivalently-sized, state-of-the-art models for mobile platforms. 
    • Finally, the image features learned from image classification are generically useful and can be transferred to other computer vision problems.
    • On the task of object detection, the learned features by NASNet used with the Faster-RCNN framework surpass state-of-the-art by 4.0% achieving 43.1% mAP on the COCO dataset.

    Popular Posts

    Show, attend and tell: Neural image caption generation with visual attention (ICML 2015)

    Multiple object recognition with visual attention (arXiv 2014)

    Towards Accurate Multi-person Pose Estimation in the Wild (CVPR 2017)

    Recurrent models of visual attention (NIPS 2014)

    DeepPose: Human pose estimation via deep neural networks (CVPR 2014)

    Pose machines: Articulated pose estimation via inference machines (ECCV 2014)