Neural architecture search with reinforcement learning (Arxiv 2017)


  • Neural architecture search with reinforcement learning
  • 주저자: Barret Zoph (Google Brain)

the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string.

a simple method of using a recurrent network to generate convolutional architectures.

the recurrent network can be trained with a policy gradient method to maximize the expected accuracy of the sampled architectures





    • GENERATE MODEL DESCRIPTIONS WITH A CONTROLLER RECURRENT NEURAL NETWORK
      • we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens:
      • In our experiments, the process of generating an architecture stops if the number of layers exceeds a certain value.
      • Once the
      • controller RNN (with parameter thetac  finishes generating an architecture, a neural network with this architecture is built and trained.





        • TRAINING WITH REINFORCE
          • The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an architecture for a child network
          • At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller. More concretely, to find the optimal architecture, we ask our controller to maximize its expected reward,


          • Since the reward signal R is non-differentiable, we need to use a policy gradient method to iteratively update RNN parameter (thetac)



          • An empirical approximation of the above quantity is:


          • Where m is the number of different architectures that the controller samples in one batch and T is the number of hyperparameters our controller has to predict to design a neural network architecture.
          • The validation accuracy that the k-th neural network architecture achieves after being trained on a training dataset is Rk.
          • 위 수식은 unbiased estimation 이나, estimate variance 가 크다. Variance 를 줄여주기 위해 base function b 를 적용한다. b 는 이전 구조들의 accuracy 에 대한 exponential moving average function 이다. (b 가 current action 에 의해 영향을 받으면, 아래 수식은 여전히 unbiased estimate 이다)

          • Accelerate Training with Parallelism and Asynchronous Updates
            • As training a child network can take hours, we use distributed training and asynchronous parameter updates in order to speed up the learning process of the controller








          Popular Posts

          Show, attend and tell: Neural image caption generation with visual attention (ICML 2015)

          Multiple object recognition with visual attention (arXiv 2014)

          Towards Accurate Multi-person Pose Estimation in the Wild (CVPR 2017)

          Recurrent models of visual attention (NIPS 2014)

          DeepPose: Human pose estimation via deep neural networks (CVPR 2014)

          Pose machines: Articulated pose estimation via inference machines (ECCV 2014)