Neural architecture search with reinforcement learning (Arxiv 2017)

11월 05, 2021

Neural architecture search with reinforcement learning
주저자: Barret Zoph (Google Brain)

the structure and connectivity of a neural network can be typically specified by a variable-length string. It is therefore possible to use a recurrent network – the controller – to generate such string.

a simple method of using a recurrent network to generate convolutional architectures.

the recurrent network can be trained with a policy gradient method to maximize the expected accuracy of the sampled architectures

GENERATE MODEL DESCRIPTIONS WITH A CONTROLLER RECURRENT NEURAL NETWORK

we use a controller to generate architectural hyperparameters of neural networks. To be flexible, the controller is implemented as a recurrent neural network. Let’s suppose we would like to predict feedforward neural networks with only convolutional layers, we can use the controller to generate their hyperparameters as a sequence of tokens:
In our experiments, the process of generating an architecture stops if the number of layers exceeds a certain value.
Once the
controller RNN (with parameter thetac finishes generating an architecture, a neural network with this architecture is built and trained.

TRAINING WITH REINFORCE

The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an architecture for a child network
At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller. More concretely, to find the optimal architecture, we ask our controller to maximize its expected reward,

Since the reward signal R is non-differentiable, we need to use a policy gradient method to iteratively update RNN parameter (thetac)

An empirical approximation of the above quantity is:

Where m is the number of different architectures that the controller samples in one batch and T is the number of hyperparameters our controller has to predict to design a neural network architecture.
The validation accuracy that the k-th neural network architecture achieves after being trained on a training dataset is Rk.
위 수식은 unbiased estimation 이나, estimate variance 가 크다. Variance 를 줄여주기 위해 base function b 를 적용한다. b 는 이전 구조들의 accuracy 에 대한 exponential moving average function 이다. (b 가 current action 에 의해 영향을 받으면, 아래 수식은 여전히 unbiased estimate 이다)

Accelerate Training with Parallelism and Asynchronous Updates

As training a child network can take hours, we use distributed training and asynchronous parameter updates in order to speed up the learning process of the controller

이 블로그 검색

Machine Learning Research

Neural architecture search with reinforcement learning (Arxiv 2017)

Popular Posts

Show, attend and tell: Neural image caption generation with visual attention (ICML 2015)

GAN paper overview

Towards Accurate Multi-person Pose Estimation in the Wild (CVPR 2017)

Recurrent models of visual attention (NIPS 2014)

Kaier 채용공고

Multiple object recognition with visual attention (arXiv 2014)

DeepPose: Human pose estimation via deep neural networks (CVPR 2014)

Object detection paper overview

A generic deep-learning-based approach for automated surface inspection (IEEE ToC 2017)

Convolutional pose machines (CVPR 2016)

태그