Awesome

ENAS-Tensorflow

I will explain the code of Efficient Neural Architecture Search(ENAS), especially case of micro search.

Unlike the author's code, This code can work in a windows 10 enviroment and you can use png files as datasets.

Also you can apply data augmentation using "n_aug_img" which is explained below.

Enviroment

OS: Window 10(Ubuntu 16.04 is possible)
Graphic Card /RAM : 1080TI /32G
Python 3.5
Tensorflow-gpu version: 1.4.0rc2
OpenCV 3.4.1

How to run

<br/>At first, you should unpack the attached data as shown below.

<br/> Next, You should change the code below to suit your situation.

<main_controller_child_trainer.py and main_child_trainer.py>

DEFINE_string("output_dir", "./output" , "")
DEFINE_string("train_data_dir", "./data/train", "")
DEFINE_string("val_data_dir", "./data/valid", "")
DEFINE_string("test_data_dir", "./data/test", "")
DEFINE_integer("channel",1, "MNIST: 1, Cifar10: 3")
DEFINE_integer("img_size", 32, "enlarge image size")
DEFINE_integer("n_aug_img",1 , "if 2: num_img: 55000 -> aug_img: 110000, elif 1: False")

It is recommended to set "n_aug_img" = 1 to find the child network, and to use 2 ~ 4 to train the found child network.

<br/>Then, You can train Controller of ENAS with the following short code:

python main_controller_child_trainer.py

<br/>After finishing, you can train the child network with the following code:

Case of MNIST 

python main_child_trainer.py --child_fixed_arc "1 2 1 3 0 1 0 4 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 4 1 0 2 0 0 3 1 1 0 0 0 0 4 1 1 0"

Case of Cifar 10

python main_child_trainer.py --child_fixed_arc "1 0 1 1 1 1 0 0 1 1 0 0 0 3 0 3 1 3 1 1 1 1 0 3 0 3 0 3 1 3 0 1 1 3 0 2 0 3 1 0"

Case of Welding Defects

python main_child_trainer.py --child_fixed_arc "1 0 0 1 0 0 1 1 2 2 1 1 1 1 1 2 1 0 0 0 0 0 0 3 2 2 1 0 2 0 2 3 0 3 4 0 1 0 3 2"

The string in the above code like "1 2 1 3 0 1 ~ " is the result of main_controller_child_trainer.py

The first 20 numbers are for the architecture for convolution layers, and the rest are for pooling layers.

Result

1. ENAS cells discoved in the micro search space

After training <main_controller_child_trainer.py>, we got the following child_arc_seq and visualized it as shown below.

MNIST

"1 2 1 3 0 1 0 4 1 1 1 1 0 1 0 1 1 0 0 1 0 1 0 4 1 0 2 0 0 3 1 1 0 0 0 0 4 1 1 0"

<br/>

CIFAR 10

"1 0 1 1 1 1 0 0 1 1 0 0 0 3 0 3 1 3 1 1 1 1 0 3 0 3 0 3 1 3 0 1 1 3 0 2 0 3 1 0"

<br/>

Welding Defects

"1 0 0 1 0 0 1 1 2 2 1 1 1 1 1 2 1 0 0 0 0 0 0 3 2 2 1 0 2 0 2 3 0 3 4 0 1 0 3 2"

<br/>

2. Final structure of the child network

MNIST

<br/>

CIFAR 10

<br/>

Welding Defects

<br/>

3. Test Accuracy

MNIST
Test Accuracy : 99.77%

CIFAR 10
Test Accuracy :

Welding Defects
Test Accuracy : 100.00%

4. Graphs

<table align='center'> <tr align='center'> <td> Controller Validation Accuracy(reward) </td> </tr> <tr> <td><img src = 'images/Controller_reward_graph.png' height = '300px'> </tr> </table> <table align='center'> <tr align='center'> <td> ChildNetwork Loss ＆ Test Accuracy for MNIST Dataset</td> </tr> <tr> <td><img src = 'images/MNIST_child_network_graph.png' height = '300px'> </tr> </table> <table align='center'> <tr align='center'> <td> ChildNetwork Loss ＆ Test Accuracy for Welding Defects Dataset </td> </tr> <tr> <td><img src = 'images/Welding_Child_network_graph.png' height = '300px'> </tr> </table>

Explained

1. Controller

First, we will build the sampler as shown in the picture below.

<br/>

<br/>Then we will make controller using sampler's output "next_c_1, next_h_1".

<br/>

<br/> After getting the "next_c_5, next_h_5", you must do the following to renew "Anchors, Anchors_w_1".

<br/>

2. Controller_Loss

To enable the Controller to make better networks, ENAS uses REINFORCE with a moving average baseline to reduce variance.

<micro_controller.py>

for all index:
    curr_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=index)
    log_prob += curr_log_prob
    curr_ent = tf.stop_gradient(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=tf.nn.softmax(logits)))
    entropy += curr_ent

for all op_id:
    curr_log_prob = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=op_id)
    log_prob += curr_log_prob
    curr_ent = tf.stop_gradient(tf.nn.softmax_cross_entropy_with_logits(
    logits=logits, labels=tf.nn.softmax(logits)))
    entropy += curr_ent

arc_seq_1, entropy_1, log_prob_1, c, h = self._build_sampler(use_bias=True) # for convolution cell
arc_seq_2, entropy_2, log_prob_2, _, _ = self._build_sampler(prev_c=c, prev_h=h) # for reduction cell 
self.sample_entropy = entropy_1 + entropy_2
self.sample_log_prob = log_prob_1 + log_prob_2

<micro_controller.py>

    self.valid_acc = (tf.to_float(child_model.valid_shuffle_acc) /
                      tf.to_float(child_model.batch_size))
    self.reward = self.valid_acc 

    if self.entropy_weight is not None:
      self.reward += self.entropy_weight * self.sample_entropy

    self.sample_log_prob = tf.reduce_sum(self.sample_log_prob)
    self.baseline = tf.Variable(0.0, dtype=tf.float32, trainable=False)
    baseline_update = tf.assign_sub(
      self.baseline, (1 - self.bl_dec) * (self.baseline - self.reward))

    with tf.control_dependencies([baseline_update]):
      self.reward = tf.identity(self.reward)

    self.loss = self.sample_log_prob * (self.reward - self.baseline)

3. Child Network

(1) Schematic of Child Network

<br/>

(2) _enas_layers

<micro_child.py>

def _enas_layers(self, layer_id, prev_layers, arc, out_filters):
    '''
    prev_layers : previous two layers. ex) layers[●,●]
    ●'s shape = [None, H, W, C]
    arc: "0 1 0 1 0 3 0 0 2 2 0 2 1 0 0 1 1 3 0 1 1 1 0 1 0 1 2 1 0 0 0 0 0 0 1 3 1 1 0 1"
    out = [self._enas_conv(x, curr_cell, prev_cell, 3, out_filters), 
           self._enas_conv(x, curr_cell, prev_cell, 5, out_filters),
           avg_pool,
           max_pool, 
           x]
    '''
    
    retrun output # calculated by arc, np.shape(output) = [None, H, W, out_filters]
                  # if child_fixed_arc is not None, np.shape(output) = [None, H, W, n*out_filters]
                  # where n is the number of not being used nodes in the coonv cell or Reduction cell.

(3) factorized_reduction

<micro_child.py>

def factorized_reduction(self, x, out_filters, strides = 2, is_training = True):
    '''
    x : x is last previous layer's output.
    out_filters: 2*(previous layer's channel)
    '''
    
    stride_spec = self._get_strides(stride)  # [1,2,2,1]
    
    # Skip path 1
    path1 = tf.nn.avg_pool(x, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)  

    with tf.variable_scope("path1_conv"):
        inp_c = self._get_C(path1)
        w = create_weight("w", [1, 1, inp_c, out_filters // 2])  
        path1 = tf.nn.conv2d(path1, w, [1, 1, 1, 1], "VALID", data_format=self.data_format)  

        # Skip path 2
        # First pad with 0"s on the right and bottom, then shift the filter to
        # include those 0"s that were added.
    if self.data_format == "NHWC":
        pad_arr = [[0, 0], [0, 1], [0, 1], [0, 0]]
        path2 = tf.pad(x, pad_arr)[:, 1:, 1:, :]
        concat_axis = 3
    else:
        pad_arr = [[0, 0], [0, 0], [0, 1], [0, 1]]
        path2 = tf.pad(x, pad_arr)[:, :, 1:, 1:]
        concat_axis = 1

    path2 = tf.nn.avg_pool(path2, [1, 1, 1, 1], stride_spec, "VALID", data_format=self.data_format)
    with tf.variable_scope("path2_conv"):
        inp_c = self._get_C(path2)
        w = create_weight("w", [1, 1, inp_c, out_filters // 2])
        path2 = tf.nn.conv2d(path2, w, [1, 1, 1, 1], "VALID", data_format=self.data_format)

    # Concat and apply BN
    final_path = tf.concat(values=[path1, path2], axis=concat_axis)
    final_path = batch_norm(final_path, is_training, data_format=self.data_format)

    return final_path

(4) _maybe_calibrate_size

<micro_child.py>

def _maybe_calibrate_size(self, layers, out_filters, is_training): 
    """Makes sure layers[0] and layers[1] have the same shapes."""
    hw = [self._get_HW(layer) for layer in layers]  
    c = [self._get_C(layer) for layer in layers]  

    with tf.variable_scope("calibrate"):
        x = layers[0]  
        if hw[0] != hw[1]:  
            assert hw[0] == 2 * hw[1]  
            with tf.variable_scope("pool_x"):
                x = tf.nn.relu(x)
                x = self._factorized_reduction(x, out_filters, 2, is_training)
        elif c[0] != out_filters:  
            with tf.variable_scope("pool_x"):
                w = create_weight("w", [1, 1, c[0], out_filters])
                x = tf.nn.relu(x)
                x = tf.nn.conv2d(x, w, [1, 1, 1, 1], "SAME", data_format=self.data_format)
                x = batch_norm(x, is_training, data_format=self.data_format)  

        y = layers[1]  
        if c[1] != out_filters:  
            with tf.variable_scope("pool_y"):
                w = create_weight("w", [1, 1, c[1], out_filters])
                y = tf.nn.relu(y)
                y = tf.nn.conv2d(y, w, [1, 1, 1, 1], "SAME", data_format=self.data_format)
                y = batch_norm(y, is_training, data_format=self.data_format)
    return [x, y]

(5) Others

You can see more details of the child network in <micro_child.py>

4. Summary of learning mechanism

<main_child_controller_trainer.py>

1. Train the Child Network during 1 Epoch. (Momentum optimization)
※ 1 Epoch = (Total data size / batch size) times parameters update.

2. Train the controller 'FLAGS.controller_train_steps x FLAGS.controller_num_aggregate' times. (Adam Optimization)

3. Repeat "1", "2" as many as we want.(160 Epochs)

4. Choose the child network architecture with the highest validation accuracy.

<main_child_trainer.py>

1. Train the child Network which is selected above as many as we want. (Momentum optimization, 660 Epochs)

Augmentation

1. Code

def aug(image, idx):
    augmentation_dic = {0: enlarge(image, 1.2),
                        1: rotation(image),
                        2: random_bright_contrast(image),
                        3: gaussian_noise(image),
                        4: Flip(image)}

    image = augmentation_dic[idx]
    return image

Function enlarge, rotation, random_bright_contrast and Flip are writen using cv2.

In the case of MNIST Data, I do not apply flip! you can check more details in <data_utils.py>

2. Images

Graphs

MNIST

CIFAR10

Welding Defects

<table align='center'> <tr align='center'> <td> Welding OK </td> <td> Welding NG </td> </tr> <tr> <td><img src = 'images/Welding_OK.jpg' height = '250px'> <td><img src = 'images/Welding_NG.jpg' height = '250px'> </tr> </table>

References

Paper: https://arxiv.org/abs/1802.03268

Autors' implementation: https://github.com/melodyguan/enas

Data Pipeline: https://github.com/MINGUKKANG/MNIST-Tensorflow-Code

License

All rights related to this code are reserved to the author of ENAS

(Hieu Pham, Melody Y. Guan, Barret Zoph, Quoc V. Le, Jeff Dean)